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Huntingtln DNA, protein and uses thereof. 



(57) A novel gene, huntingtin, is described, encod- 
ing huntingtln protein, recombinant vectors and 
hosts capable of expressing huntingtin. 
Methods for the diagnosis and treatment of 
Huntington's disease are also provided. 
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Field of the invention 



The Invention is in the field of the detection and treatment of genetic diseases. Specifically, the invention 
Is directed to the huntingtin gene (also called the IT15 gene), huntingtin protein encoded by such gene, and 
the use of this gene and protein in assays (1 ) for the detection of a predisposition to develop Huntington's dis- 
ease. (2) for the diagnosis of Huntington's disease (3) for the treatment of Huntington's disease, and (4) for 
monitoring the course of treatment of such treatment. 

Bacl^ground of the Invention 



Huntington's disease (HD) is a progressive neurodegenerative disorder characterized by motor distur- 
bance, cognitive loss and psychiatric manifestations (Martin and Gusella, N. Engl. J, Med, 3t 5: 1267-1 276 
(1986). It Is inherited in an autosomal dominant fashion, and affects about 1/10,000 individuals in most popu- 
lations of European origin (Harper, P.S. et aL, in Huntington's disease, W.B. Saunders. Philadelphia. 1991). 
15 The hallmark of HD is a distinctive choreic movement disorder that typically has a subtle, insidious onset in 
the fourth to fifth decade of life and gradually worsens over a course of 1 0 to 20 years until death. Occasionally, 
HD Is expressed in juveniles typically manifesting with more severe symptoms including rigidity and a more 
rapid course. Juvenile onset of HD is associated with a preponderance of paternal transmission of the disease 
allele. The neuropathology of HD also displays a distinctive pattern, with selective loss of neurons that Is most 
20 severe in the caudate and putamen regions of the brain. The biochemical basis for neuronal death in HD has 
not yet been explained, and there Is consequently no treatment effective In delaying or preventing the onset 
and progression of this devastating disorder. 

The genetic defect causing HD was assigned to chromosome 4 in 1983 in one of the first successes of 
linkage analysis using polymorphic DNA markers in man (Gusella etal.. Nature 306:234-238 (1983). Since that 
time, we have pursued a location cloning approach to isolating and characterizing the HD gene based on pro- 
gressively refining its localization (Gusella. FASEBJ. 3:2036-2041 (1989); Gusella, Adv. Hum. Genet. 20:125- 
151 (1991)). Among other work, this has involved the generation of new genetic markers in the region by a 
number of techniques (Pohl ef a/.. Nucleic Acids Res. t6:9185-9198 (1988); Whaley et al., Somat Cell. Mol. 
Genet t7:83-91 (1991); MacDonald et ai., J. Clin. Inv. 84:1013-1016 (1989)). the establishment of genetic 
(MacDonald et al.. Neuron 3:183-190(1989); Allltto et al., Genomics 9:104-112 (1991)) and physical maps of 
the implicated regions (Bucan ef a/.. Genomics 6:1-15 (1990); Bates eial.. Nature Genet. y:180-187 (1992); 
Doucette-Stamm et aL, Somat Ceil MoL Genet 77:471-480 (1991); Altherr a/.. Genomics t3:1040-1046 
(1 992)), the cloning of the 4p telomere of an HD chromosome in a YAC clone (Bates et a/.. Am. J. Hum. Genet 
46:762-775 (1990); Youngman et a/.. Genomics f 4:350-356 (1 992)). the establishment of YAC [yeast artificial 
chromosome] (Bates a/., NatureGenet t:180-187(1992)) and cosmid(Baxendalee/a/., in preparation) con- 
tigs (a series of overlapping clones which together form a whole sequence) of the candidate region, as well 
as the analysis and characterization of a number of candidate genes from the region (Thompson et at. Gen- 
omics 1/11133-1142 (1991); Taylor et a/.. Nature Genet 2:223-227 (1992); Ambrose et aL, Hum. Mol. Genet 
f :697-703 (1992)). Analysis of recombination events in HD kindreds has identified a candidate region of 2 2 
40 Mb. between D4S10 and D4S98 in 4p1 6.3, as the most likely position of the HD gene (MacDonald etal.. Neuron 
3:183-190 (1989); Bates et aL, Am, J, Hum. Genet 49:7-16 (1991); Snell et aL, Am. J. Hum. Genet. 51:357- 
362 (1992)). Investigations of linkage disequilibrium between HD and DNA markers in 4p16.3 (Snell et al J 
Med Genet 26:673-675 (1989); Theilman et aL, J. Med Genet 26:676-681 (1989)) have suggested that mul- 
tiple mutations have occun-ed to cause the disorder (MacDonald etaL, Am. J. Hum. Genet 49:723-734 (1991)). 
45 However, haplotype analysis using multi-allele markers has indicated that at least 1/3 of HD chromosomes are 
ancestrally related (MacDonald et aL, Nature Genet 1: 99-103 (1992)). The haplotype shared by these HD 
chromosomes points to a 500 kb segment between D4S180 and D4S182 as the most likely site of the ge- 
netic defect. 

Targeting this 500 kb region for saturation with gene transcripts, exon amplification has been used as a 
50 rapid method for obtaining candidate coding sequences (Buckler etaL, Proc. NatL Acad. ScL USA 88:4005- 
4009 (1991)). This strategy has previously identified three genes: the a-adducin gene (ADDA) (Taylor et aL, 
Nature Genet 2:223-227 (1992)); a putative novel transporter gene (IT10C3) in the distal portion of this seg- 
ment* and a novel G protein-coupled receptor kinase gene (IT11) in the central portion (Ambrose et aL, Hum. 
MoL Genet t :697-703 (1 992)). However, no defects implicating any of these genes as the HD locus have been 
55 found. 
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Summary of the Invention 

A large gene, termed herein "huntingtin" or "IT15/' has been identified that spans about 210 kb and en- 
codes a previously undescribed protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
5 (CAG)n trinucleotide repeat with at least 17 alleles in the normal population, varying from 1 1 to about 34 CAG 
copies. On HD chromosomes, the length of the trinucleotide repeat is substantially increased, for example, 
about 37 to at least 73 copies, and shows an apparent correlation with age of onset, the longest segments are 
detected in juvenile HD cases. The instability in length of the repeat is reminiscent of similar trinucleotide re- 
peats in the fragile X syndrome and in myotonic dystrophy (Suthers et al, J. Med. Genet 29:761-765 (1 992)). 
10 The presence of an unstable, expandable trinucleotide repeat on HD chromosomes In the region of strongest 
linkage disequilibrium with the disorder suggests that this alteration underlies the dominant phenotype of HD, 
and that huntingtin encodes the HD gene. 

The invention is directed to the protein huntingtin, DNAand RNA encoding this protein, and uses thereof. 

Accordingly, in a first embodiment, the invention is directed to purified preparations of the protein hun- 
ts tingtin, preferably substantially cell-free. 

In a further embodiment, the invention is directed to a recombinant construct containing DNA or RNA en- 
coding huntingtin. 

In a further embodiment, the invention is directed to a vector containing such huntingtin-encoding nucleic 
acid. 

20 In a further embodiment, the invention is directed to a host transformed with such vector. 

In a further embodiment, the invention is directed to a method for producing huntingtin from such recom- 
binant host. 

In a further embodiment, the invention is directto a method for diagnosing Huntington's disease using such 
huntingtin DNA, RNA and/or protein. 
25 In a further embodiment, the invention is directed to a method for treating Huntington's disease using such 

huntingtin DNA, RNA and/or protein. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin gene with a (CAG)n repeat of 
the normal range of 11-34 copies to the desired cells of such patient in need of such treatment, in a manner 
30 that permits the expression of the huntingtin protein provided by such gene, for a time and in a quantity suf- 
ficient to provide the huntingtin function to the cells of such patient. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin antisense gene to the desired 
cells of such patient in need of such treatment, in a manner that permits the expression of liuntingtin antisense 
35 RNA provided by such gene, for a time and in a quantity sufficient to inhibit huntingtin mRNA expression in 
the cells of such patient 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin gene to the cells of such patient 
in need of such gene; in one embodiment the functional huntingtin gene contains a (CAG)n repeat size between 
40 11-34 copies. 

In a further embodiment, the invention is directed to a method for diagnosing Huntington's disease or a 
predisposition to develop Huntington's disease in a patient, such method comprising determining the number 
of (CAG)n repeats present in the huntingtin gene in such patient and especially in the affected tissue of such 
patient. 

45 In a further embodiment, the invention is directed to a method for treating Huntington's disease in a patient, 

such method comprising decreasing the number of huntingtin (CAG)n repeats in the huntingtin gene in the de- 
sired cells of such patient 

Brief Description of the Drawings 

50 

FIGURE 1. Long-range restriction map of the HD candidate region. A partial long range restriction map of 
4p16.3 is shown (adapted from Lin ef a/., Somat. CellMol. Genet ^ 7:481-488 (1991)). The HD candidate region 
determined by recombination events is depicted as a hatched line between D4S10 and D4S98, The portion 
of the HD candidate region implicated as the site of the defect by linkage disequilibrium haplotype analysis 
55 (MacDonald et a/.. Nature Genet 03 (^ 992) is shown as a filled box. Below the map schematic, the region 

from D4S180 to D4S182 is expanded to show the cosmid contig (averaging 40 kb/cosmid). The genomic cov- 
erage and where known the transcriptional orientation (arrow 5' to 3')~of the huntingtin (IT15), IT11, IT10C3 
and ADDA genes is also shown. Locus names above the map denote selected polymorphic markers that have 
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been used rn HD families. The positions of D4S127 and D4S95 which form the core of haplotype in the region 
of maximum disequilibrium are also shown in the cosmid contig. Restriction sites are given for Not I (N) Mlu 
I (M) and Nru 1 (R). Sites displaying complete digestion are shown in boldface while sites subject to frequent 
incomplete digestion are shown as lighter symbols. Braclcets around the "N" symbols indicate the presence of 
additional clustered Not I sites. 

FIGURE 2. Northern blot analysis of the huntingtin (IT1 5) transcript Results of the hybridization of IT1 5A 
to a Northern blot of RNA from normal (lane 1) and HD homozygous (lane 2 and 3) lymphoblasts are shown 
A single RNA of about 11 kb was detected in all three samples, with slight apparent variations being due to 
unequal RNA concentrations. The HD homozygoles are independent, deriving from the large an American fam- 
ily (lane 2) and the large Venezuelan family (lane 3). respectively. The Venezuelan HD chromosome has a 
4p16.3 haplotype of "5 2 2" defined by a (GT)„ polymorphism at D4S127 and VNTR and TaqI RFLPs at D4S95 
The Amencan homozygote carries the most common 4p16.3 haplotype found on HD chromosomes- "2 111" 
(MacDonald etal.. Nature Genet. f:99-103 (1992)). 

FIGURE 3. Schematic of cDNA clones defining the IT15 transcript. Five cDNAs are represented under a 
schematic of the composite IT15 sequence. The thin line con^esponds to untranslated regions. The thick line 
corresponds to coding sequence, assuming initiatfon of translation at the first Met codon in the open reading 
frame. Stars mari< the positions of the following exon clones 5' to 3': DL83D3-8 DL83D3-1 DL228B6-3 
DL228B6-5, DL228B6-13. DL89F7-3, DL178H4-6. DL118F5-U and DL134B9-U4. 

The composite sequence was derived as follows. From 22 bases 3" to the putative initiator Met ATG the 
sequence was compiled from the cDNA clones and exons shown. There are 9 bases of sequence intervening 
between the 3' end of IT16B and the 5' end of IT15B. These were by PCR amplification of first strand cDNA 
and sequencing of the PCR product. At the 5' end of the composite sequence, the cDNA clone IT16C terminates 
27 bases upstream of the (CAG)„. However, when IT16C was identified, we had already generated genomic 
sequence surrounding the (CAG)„ in an attempt to generate new polymorphisms. This sequence matched the 
IT16C sequence, and extended It 337 bases upstream, including the apparent Met initiation codon 

FIGURE 4. Composite sequence of huntingtin (IT1 5)(SEQ ID NO:5 and SEQ ID NO;6). The composite DNA 
sequence of huntingtin (iT15) is shown (SEQ ID NO:5). The predicted protein product (SEQ ID NO-6) is shown 
below the DNA sequence, based on the assumption that translation begins at the first in-frame methionine of 
the long open reading frame. 

FIGURE 5. DNA sequence analysis of the (CAG)„ repeat. DNA sequence shown in panels 1, 2 and 3, dem- 
onstrates the vanation in the (CAG)„ repeat detected in normal cosmid L191F1 (1), cDNA IT16C (2) and HD 
cosmid GUS72-2130. Panels 1 and 3 were generated by direct sequencing of cosmid subclones using the fol- 
lowing primer (SEQ ID NO:1): 

35 5' GGC GGG AGA CCG CCA TGG CG 3'. 

Panel 2 was generated using the pBSKII T7 primer (SEQ ID NO:2): 

5' AAT ACG ACT CAC TAT AG 3'. 

FIGURE 6. PCR analysis of the (CAG)„ repeat in a Venezuelan HD sibship with some offspring displaying 
juvenile onset Results of PCR analysis of a sibship in the Venezuela HD pedigree are shown. Affected indi- 
viduals are represented by shaded symbols. Progeny are shown as triangles for confidentiality. AN1 AN2 and 
AN3 marie the positions of the allelic products from normal chromosomes. AE marks the range of PCR products 
from the HD chromosome. The intensity of background constant bands, which represent a useful reference 
for comparison of the above PCR products, varies with slight differences in PCR conditions. The PCR products 
from cosmids L191F1 and GUS72-2130 are loaded in lanes 12 and 13 and have 18 and 48 CAG repeats re- 
spectively. ' 

FIGURE 7. PCR analysis of the (CAG)„ repeat in a Venezuelan HD sibship with offspring homozygous for 
the same HD haplotype. Results of PCR analysis of a sibship from the Venezuela HD pedigree in which both 
parents are affected by HD are shown. Progeny are shown as triangles for confidentiality and no HD diagnostic 
' *° preserve the blind status of investigators in the Venezuelan Collaborative Group AN1 

""^"^ positions of the allelic products from nonnal parental chromosomes. AE mari<s the range 
of PCR products from the HD chromosome. The PCR products from cosmids L191F1 and GUS72-2130 are 
loaded in lanes 29 and 30 and have 18 and 48 CAG repeats, respectively. 

FIGURE 8. PCR analysis of the (CAG)„ repeat in members of an American family with an individual hom- 
ozygous for the major HD haplotype. Results of PCR analysis of members of an American family segregating 
the major HD haplotype. AN marks the range of normal alleles; AE marks the range of HD alleles Lanes 1 3 
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4, 6, 7 and 8 represent PGR products from related HD heterozygotes. Lane 2 contains the PGR products from 
a member of the family homozygous for the same HD chromosome. L-ane 6 contains PGR products from a 
nornnal individual. Pedigree relationships and affected status are not presented to preserve confidentiality. The 
PGR products from cosmids L191F1 and GUS72-2130 (which was derived from the individual represented In 

5 lane 2) are loaded in lanes 9 and 10 and have 18 and 48 GAG repeats, respectively. 

FIGURES 9 and 10. PGR analysis of the (GAG)n repeat In two families with supposed new mutation causing 
HD. Results of PGR analysis of two families In which sporadic HD cases representing putative new mutants 
are shown. Individuals in each pedigree are numbered by generation (Roman numerals) and order In the pedi* 
gree. Triangles are used to protect confidentiality. Filled symbols indicate symptomatic individuals. The dif> 

10 ferent chromosomes segregating In the pedigree have been distinguished by extensive typing with polymorphic 
markers in 4p16.3 and have been assigned arbitrary numbers shown above the gel lanes. The starred chro- 
mosomes (3 In Figure 9, 1 in Figure 10) represent the presumed HD chromosome. AN denotes the range of 
normal alleles; AE denotes the range of alleles present in affected individuals and In their unaffected relatives 
bearing the same chromosomes. 

15 FIGURE 11. Gomparison of (GAG)n Repeat Unit Number on Gontrol and HD Ghromosomes. Frequency 

distributions are shown for the number of (GAG)n repeat units observed on 425 HD chromosomes from 150 
independent families, and from 545 control chromosomes. 

FIGURE 12. Comparison of (GAG)n Repeat Unit Number on Maternally and Paternally Transmitted HD 
Ghromosomes. Frequency distributions are shown for the 1 34 and 1 61 HD chromosomes from Figure 1 1 known 

20 to have been transmitted from the mother (Panel A) and father (Panel B), respectively. The two distributions 
differ significantly based on a t-test (t272.3 =5.34. p<0.0001). 

FIGURE 13. Gomparison of (GAG)n Repeat Unit Number on HD Chromosomes from Three Large Families 
with Different HD Founders. Frequency distributions are shown for 75, 25 and 35 HD chromosomes from the 
Venezuelan HD family (Panel A) (Gusella, J.R, et at., Nature 306:234- 238 (1983); Wexler, N.S., eta!.. Nature 

25 326:194-197 (1987)), Family Z (Panel B) and Family D (Panel C) (Folstein. S.E., et a!., Science 229:776-779 
(1985)), respectively. The Venezuelan distribution did not differ from the overall HD chromosome distribution 
in Figure 11 (t79.7= 1.58, p<0.12). Both Family Z and Family D did produce distributions significantly different 
from the overall HD distribution (t42^=6.73, p<0.0001 and t458=2.90, p<0.004, respectively). 

Figure 14. Relationship of (GAG)n Repeat Length in Parents and Corresponding Progeny. Repeat length 

30 on the HD chromosome in mothers (Panel A) or fathers (Panel B) is plotted against the repeat length in the 
corresponding offspring. A total of 25 maternal transmissions and 37 paternal transmissions were available 
for typing. 

FIGURE 15. Amplification of the HD (CAG)n Repeat From Sperm and Lymphoblast DNA. DMA from sperm 
(S) and lymphoblasts (L) for 5 members (pairs 1-5) of the Venezuelan HD pedigree aged 24-30 were used for 
35 PGR amplification of the HD (GAG)^ repeat. The lower band In each lane derives from the normal chromosome. 

FIGURE 18. Relationship of Repeat Unit Length with Age of Onset. Age of onset was established for 234 
diagnosed HD gene carriers and plotted against the repeat length observed on both the HD and normal chro- 
mosomes in the corresponding lymphoblast lines. 

40 DetailBd Description of the Invention 

In the following description, reference will be made to various methodologies known to those of skill in the 
art of molecular genetics and biology. Publications and other materials setting forth such known methodologies 
to which reference is made are incorporated herein by reference in their entireties as though set forth In full. 

45 The IT1 5 gene described herein Is a gene from the proximal portion of the 500 kb segment between human 

chromosome 4 markers D4S180 and D4S182, The huntingtin gene spans about 210 kb of DNA and encodes 
a previously undescribed protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
(GAG)n trinucleotide repeat with at least 17 alleles in the normal human population, where the repeat number 
varies from 11 to about 34 GAG copies In such alleles. This is the gene of the human chromosome that, as 

50 shown herein, suffers the presence of an unstable, expanded number of GAG trinucleotide repeats in Hun- 
tington's disease patients, such that the number of GAG repeats In the huntingtin gene increases to a range 
of 37 to at least 86 copies. These results are the basis of a conclusion that the huntingtin gene encodes a protein 
called "huntingtin,** and that in such huntingtin gene the Increase in the number of GAG repeats to a range of 
greater than about 37 repeats is the alteration that underlies the dominant phenotype of Huntington's disease. 

55 As used herein huntingtin gene is also called the Huntington's disease gene. 

It is to be understood that the description below is applicable to any gene in which a GAG repeat within 
the gene is amplified in an aberrant manner resulting in a change in the regulation, localization, stability or 
transtatability of the mRNA containing such amplified GAG repeat that is transcribed from such gene. 
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/- Cloning OfHuntingtin DNA And Expression OfHuntingtin Protein 

The identification of huntingtin DNA and protein as ttie altered gene in Huntington's disease patients is 
exemplified below. In addition to utilizing the exemplified methods and results for the identification of deletions 

5 of the huntingtin gene in Huntington's disease patients, and for the isolation of the native human huntingtin 
gene, the sequence infonmation presented in Figure 4 represents a nucleic acid and protein sequence, that, 
when inserted into a linear or circular recombinant nucleic acid construct such as a vector, and used to trans- 
form a host cell, will provide copies of huntingtin DNA and huntingtin protein that are useful sources for the 
native huntingtin DNA and huntingtin protein for the methods of the invention. Such methods are known in the 

10 art and are briefly outlined below. 

The process for genetically engineering the huntingtin coding sequence, for expression under a desired 
promoter, is facilitated through the doning of genetic sequences which are capable of encoding such huntingtin 
protein. Such cloning technologies can utilize techniques known in the art for construction of a DNA sequence 
encoding the huntingtin protein, such as. for example, polymerase chain reaction technologies utilizing the hun- 

15 tingtin sequence disclosed herein to isolate the huntingtin gene anew, or an allele thereof that varies in the 
number of CAG repeats in such gene, or polynucleotide synthesis methods for constructing the nucleotide se- 
quence using chemical methods. Expression of the cloned huntingtin DNA provides huntingtin protein. 

As used herein, the term "genetic sequences" is intended to refer to a nucleic acid molecule of DNA or 
RNA. preferably DNA. Genetic sequences that are capable of being operably linked to DNA encoding huntingtin 

20 protein, so as to provide for its expressfon and maintenance in a host cell are obtained from a variety of sources 
Including commercial sources, genomic DNA. cDNA, synthetic DNA. and combinations thereof. Since the ge- 
netic code is universal, it Is to be expected that any DNA encoding the huntingtin amino acid sequence of the 
invention will be useful to express huntingtin protein in any host, Including prokaryotic (bacterial) hosts, eu- 
karyotic hosts (plants, mammals (especially human), insects, yeast, and especially any cultured cell popula- 

25 tions). 

If it is desired to select anew a gene encoding huntingtin from a library that is thought to contain a huntingtin 
gene, such library can be screened and the desired gene sequence Identified by any means which specifically 
selects for a sequence coding for the huntingtin gene or expressed huntingtin protein such as. for example 
a) by hybridization (under stringent conditions for DNA:DNA hybridization) with an appropriate huntingtin DNA 
probe(s) containing a sequence specific for the DNA of this protein, such sequence being that provided in Fig- 
ure 4 or a functional derivative thereof that is. a shortened form that is of sufficient length to identify a done 
containing the huntingtin gene, or b) by hybridization-selected translatlonal analysis in which native huntingtin 
mRNA which hybridizes to the done In question is translated in vitro and the translation products are further 
characterized for the presence of a biological activity of huntingtin. ore) by immunoprecipitation of a translated 
huntingtin protein product from the host expressing the huntingtin protein. 

When a human allele does not encode the identical sequence to that of Figure 4. it can be isolated and 
Identified as being huntingtin DNA using the same techniques used herein, and especially PGR techniques to 
amplify the appropriate gene with primers based on the sequences disdosed herein. Many polymorphic probes 
useful In the fine localization of genes on chromosome 4 are known and available (see, for example 
-ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries." fifth edition. 1 991, pages 
4-6. For example, a useful D4S10 probe is done designation pTV20 (ATCC 57605 and 57604)* H5 52 (ATCC 
61107 and 61106) and F5.53 (ATCC 61108). 

Human chromosome 4-spedf ic libraries are known in the art and available from the ATCC for the isolation 
of probes ("ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries." fifth edition 
1991. pages 72-73). for example. LL04NS01 and LL04NS02 (ATCC 57719 and ATCC57718) are useful for 
these purposes. 

It is not necessary to utilize the exact vector constructs exemplified in the invention: equivalent vectors 
can be constructed using techniques known in the art For example, the sequence of the huntingtin DNA is 
provided herein, (see Figure 4) and this sequence provides the specif idty for the huntingtin gene; it is only 
necessary that a desired probe contain this sequence, or a portion thereof sufficient to provide a positive in- 
dication of the presence of the huntingtin gene. 

Huntingtin genomic DNAmay or may not indude naturally occurring introns. Moreover, such genomic DNA 
can be obtained in association with the native huntingtin 5' promoter region of the gene sequences and/orwith 
the native huntingtin 3' transcriptional termination region. 

Such huntingtin genomic DNA can also be obtained in association with the genetic sequences which en- 
code the 5' non-translated region of the huntingtin mRNA and/or with the genetic sequences which encode 
the huntingtin 3' non-translated region. To the extent that a host cell can recognize the transcriptional and/or 
translatlonal regulatory signals assodated with the expression of huntingtin mRNA and protein then the 
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5' and/or 3' non-transcribed regions of the native huntingtin gene, and/or, the 5' and/or 3' non-translated re- 
gions of the huntingtin mRNAcan be retained and employed for transcriptional and translational regulation. 

Genonnic DNA can be extracted and purified from any host cell, especially a human host cell possessing 
chromosome 4, by means well known in the art Genomic DNA can be shortened by means known in the art, 

5 such as physical shearing or restriction digestion, to Isolate the desired huntingtin gene from a chromosomal 
region that otherwise would contain more information than necessary for the utilization of the huntingtin gene 
in the hosts of the invention. For example, restriction digestion can be utilized to cleave the full-length se- 
quence at a desired location. Alternatively, or in addition, nucleases that cleave from the S'-iend of a DNA mol- 
ecule can be used to digest a certain sequence to a shortened form, the desired length then being identified 

10 and purified by polymerase chain reaction technologies, gel electrophoresis, and DNA sequencing. Such nu- 
cleases include, for example, Exonuctease III and Bal31. Other nucleases are well known in the art 

Alternatively, if it is known that a certain host cell population expresses huntingtin protein, then cDNA tech- 
niques known in the art can be utilized to synthesize a cDNA copy of the huntingtin mRNA present in such 
population. 

15 For cloning the genomic or cDNA nucleic add that encodes the amino acid sequence of the huntingtin pro- 

tein into a vector, the DNA preparation can be llgated into an appropriate vector. The DNA sequence encoding 
huntingtin protein can be inserted Into a DNA vector In accordance with conventional techniques, including 
blunt-ending or staggered-ending termini for ligation, restriction enzyme digestion to provide appropriate ter- 
mini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, 

20 and ligation with appropriate ligases. Techniques for such manipulations are well known in the art. 

When the huntingtin DNA coding sequence and an operably finked promoter are introduced into a recipient 
eukaryotic cell (preferably a human host cell) as a non-replicating, non-integrating, molecule, the expression 
of the encoded huntingtin protein can occur through the transient (nonstable) expression of the Introduced se- 
quence. 

25 Preferably the coding sequence is introduced on a DNA molecule, such as a closed circular or linear mol- 

ecule that is capable of autonomous replication. If integration into the host chromosome is desired, it is pre- 
ferable to use a linear molecule. If stable maintenance of the huntingtin gene is desired on an extrachromo- 
somal element, then it is preferable to use a circular plasmid form, with the appropriate plasmid element for 
autonomous replication in the desired host 

30 The desired gene construct, providing a gene coding for the huntingtin protein, and the necessary regu- 

latory elements operably linked thereto, can be introduced into a desired host cells by transformation, trans- 
fection, or any method capable of providing the construct to the host cell. A marker gene for the detection of 
a host cell that has accepted the huntingtin DNA can be on the same vector as the huntingtin DNA or on a 
separate construct for cotransformation with the huntingtin coding sequence construct into the host cell. The 

35 nature of the vector will depend on the host organism. 

Suitable selection markers will depend upon the host cell. For example, the marker can provide biocide 
resistance, e.g., resistance to antibiotics, or heavy metals, such as copper, or the like. 

Factors of Importance in selecting a particular plasmid or viral vector Include: the ease with which recipient 
cells that contain the vector can be recognized and selected from those recipient cells which do not contain 

40 the vector; the number of copies of the vector which are desired in a particular host; and whether it is desirable 
to be able to "shuttle" the vector between host cells of different species. 

When it is desired to use S. cerevisiae as a host for a shuttle vector, preferred S. cerevisiae yeast plasmids 
include those containing the 2-micron circle, etc., or their derivatives. Such plasmids are well known in the art 
and are commercially available. 

45 Oligonucleotide probes specific for the huntingtin sequence can be used to identify clones to huntingtin 

and can be designed de novofrom the knowledge of the amino acid sequence of the protein as provided herein 
in Figure 4 or from the knowledge of the nucleic acid sequence of the DNA encoding such protein as provided 
herein in Figure 4 or of a related protein. Alternatively, antibodies can be raised against the huntingtin protein 
and used to identify the presence of unique protein determinants in transformants that express the desired 

50 cloned protein. 

A nucleic acid molecule, such as DNA, Is said to be "capable of expressing" a huntingtin protein if that 
nudeic acid contains expression control sequences which contain transcriptional regulatory information and 
such sequences are "operably linked" to the huntingtin nucleotide sequence which encode the huntingtin poly- 
peptide. 

55 An operable linkage is a linkage in which a sequence is connected to a regulatory sequence (or sequences) 

in such a way as to place expression of the sequence under the influence or control of the regulatory sequence. 
If the two DNA sequences are a coding sequence and a promoter region sequence linked to the 5* end of the 
coding sequence, they are operably linked if induction of promoter function results in the transcription of mRNA 

7 
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encoding the desired protein and if the nature of the linkage between the two DNA sequences does not (1) 
result in the introduction of a frame-shift mutation. (2) interfere with the ability of the expression regulatory 
sequences to direct the expression of the protein, antisense RNA. or (3) interfere with the ability of the DNA 
template to be transcribed. Thus, a promoter region would be operably linked to a DNA sequence if the promoter 

5 was capable of effecting transcription of that DNA sequence. 

The precise nature of the regulatory regions needed for gene expression can vary between species or 
cell types, but shall in general include, as necessary, 5' non-transcribing and 5' non-translating (non-coding) 
sequences involved with initiation of transcription and translation respectively, such as the TATA box. capping 
sequence. CAAT sequence, and the like, with those elements necessary for the promoter sequence being pro- 

10 vided by the promoters of the invention. Such transcriptional control sequences can also include enhancer se- 
quences or upstream activator sequences, as desired. 

The vectors of the invention can further comprise other operably linked regulatory elements such as DNA 
elements which confer antibbtic resistance, or origins of replication for maintenance of the vector in one or 
more host cells. 

15 In another embodiment, especially for maintenance of the vectors of the invention in prokaryotic cells, or 

in yeast S. cerevisiae cells, the introduced sequence is incorporated into a plasmid or viral vector capable of 
autonomous replication in the recipient host Any of a wide variety of vectors can be employed for this purpose. 
In Bacillus hosts, integration of the desired DNA can be necessary. 

Expression of a protein in eukaryotic hosts such as a human cell requires the use of regulatory regions 
20 functional in such hosts. A wide variety of transcriptional and translational regulatory sequences can be em- 
ployed, depending upon the nature of the host. Preferably, these regulatory signals are associated in their na- 
tive state with a particular gene which Is capable of a high level of expression in the specific host cell, such 
as a specific human tissue type. In eukaryotes, where transcription Is not linked to translatton, such control 
regions may or may not provide an initiator methionine (AUG) codon. depending on whether the cloned se- 
25 quence contains such a methionine. Such regions will, in general, include a promoter region sufficient to direct 
the initiation of RNA synthesis in the host cell. 

If desired, the non-transcribed and/or non-translated regions 3' to the sequence coding for the huntingtin 
protein can be obtained by the above-described cloning methods. The 3'- non-transcribed region of the native 
human huntingtin gene can be retained for its transcriptional termination regulatory sequence elements, or 
for those elements which direct polyadenylation in eukaryotic cells. Where the native expression control se- 
quences signals do not function satisfactorily in a host cell, then sequences functional in the host cell can be 
substituted. 

It may be desired to construct a fusion product that contains a partial coding sequence (usually at the amino 
terminal end) of a first protein or small peptide and a second coding sequence (partial or complete) of the hun- 
tingtin protein at the cari^oxyl end. The coding sequence of the first protein can. for example, function as a 
signal sequence for secretion of the huntingtin protein from the host cell. Such first protein can also provide 
for tissue targeting or localization of the huntingtin protein if it is to be made in one cell type In a multicellular 
organism and delivered to another cell type in the same organism. Such fusion protein sequences can be de- 
signed with or without specific protease sites such that a desired peptide sequence is amenable to subsequent 
40 removal. 

The expressed huntingtin protein can be isolated and purified from the medium of the host in accordance 
with conventional conditions, such as extraction, precipitation, chromatography, affinity chromatography, elec- 
trophoresis, or the like. For example, affinity purification with anti-huntingtin antibody can be used. A protein 
having the amino acid sequence shown in Figure 3 can be made, or a shortened peptide of this sequence can 

45 be made, and used to raised antibodies using methods well known in the art These antibodies can be used 
to affinity purify or quantitate huntingtin protein from any desired source. 

If it is necessary to extract huntingtin protein from the intracellular regions of the host cells, the host cells 
can be collected by centrif ugation. or with suitable buffers, lysed. and the protein isolated by column chroma- 
tography. for example, on DEAE-cellulose. phosphocellulose, polyribocytidylic acid-agarose. hydroxyapatite 

50 or by electrophoresis or immunoprecipitation. 

/I. UsB Of Huntingtin For Diagnostic And Treatntent Purposes 



30 



35 



55 



It is to be understood that although the following discussion is specifically directed to human patients, the 
teachings are also applicable to any animal that expresses huntingtin and in which alteration of huntingtin, es- 
pecially the amplification of CAG repeat copy number, leads to a defect in huntingtin gene (structure or func- 
tion) or huntingtin protein (structure or function or expression), such that clinical manifectations such as those 
seen in Huntington's disease patients are found. 
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It is also to be understood that the methods referred to herein are applicable to any patient suspected of 
developing/having Huntington*s disease, whether such condition is nnanifest at a young age or at a more ad- 
vanced age in the patienf s life. It is also to be understood that the term "patient" does not imply that symptoms 
are present, and patient Includes any individual it is desired to examine or treat using the methods of the in- 



The diagnostic and screening methods of the invention are especially useful for a patient suspected of 
being at risk for developing Huntington's disease based on family history, or a patient in which it is desired to 
diagnose or eliminate the presence of the Huntington's disease condition as a causative agent behind a pa- 
tienf s symptoms. 

10 It is to be understood that to the extent that a patient's symptoms arise due to the alteration of the CAG 

repeat copy numbers in the huntingtin gene, even without a diagnosis of Huntington's disease, the methods 
of the invention can identify the same as the underlying basis for such condition. 

According to the invention, presymptomatic screening of an individual in need of such screening for their 
likelihood of developing Huntington's disease is now possible using DNA encoding the huntingtin gene of the 

IS invention, and specifically, DNA having the sequence of the normal human huntingtin gene. The screening 
method of the invention allows a presymptomatic diagnosis, including prenatal diagnosis, of the presence of 
an aberrant huntingtin gene in such individuals, and thus an opinion concerning the likelihood that such indi- 
vidual would develop or has developed Huntington's disease or symptoms thereof. This is especially valuable 
for the identification of carriers of altered huntingtin gene alleles where such alleles possess an increased num- 

20 ber of CAG repeats in their huntingtin gene, for example, from individuals with a family history of Huntington's 
disease. Especially useful for the determination of the number of CAG repeats in the patient's huntingtin gene 
is the use of PGR to amplify such region or DNA blotting techniques. 

For example, in the method of screening, a tissue sample would be taken from such individual, and 
screened for (1 ) the presence of the 'normal' human huntingtin gene, especially for the presence of a "normal" 

25 range of 11-34 CAG copies in such gene. The human huntingtin gene can be characterized based upon, for 
example, detection of restriction digestion patterns in 'normal' versus the patient's DNA, including RFLP ana- 
lysis, using DNA probes prepared against the huntingtin sequence (or a functional fragment thereof) taught in 
the invention. Similarly, huntingtin mRNAcan be characterized and compared to normal huntingtin mRNA (a) 
levels and/or (b) size as found in a human population not at risk of developing Huntington's disease using sim- 

30 ilar probes. Lastly, huntingtin protein can be (a) detected and/or (b) quantitated using a biological assay for 
huntingtin, for example, using an immunological assay and anti-huntingtin antibodies. When assaying hunting- 
tin protein, the immunological assay is preferred for its speed. Methods of making antibody against the hun- 
tingtin are well known in the art. 



An (1 ) aberrant huntingtin DNA size pattern, such as an aberrant huntingtin RFLP, and/or (2) aberrant hun- 



35 tingtin mRNA sizes or levels and/or (3) aberrant huntingtin protein levels would indicate that the patient has 
developed or is at risk for developing a hunting tin-associated symptom such as a symptom associated with 
Huntington's disease. 

The screening and diagnostic methods of the invention do not require that the entire huntingtin DNA coding 
sequence be used for the probe. Rather, it is only necessary to use a fragment or length of nucleic acid that 

40 is sufficient to detect the presence of the huntingtin gene in a DNA preparation from a normal or affected in- 
dividual, the absence of such gene, or an altered physical property of such gene (such as a change in elec- 
trophoretic migration pattern). 

Prenatal diagnosis can be performed when desired, using any known method to obtain fetal cells, including 
amniocentesis, chorionic villous sampling (CVS), and fetoscopy. Prenatal chromosome analysis can be used 

45 to determine if the portion of chromosome 4 possessing the normal huntingtin gene is present in a heterozy- 
gous state, and PGR amplification or DNA blotting utilized for estimating the size of the CAG repeat in the 
huntingtin gene. 

The huntingtin DNA can be synthesized, especially, the CAG repeat region can be amplified and, if desired, 
labeled with a radioactive or nonradbactive reporter group, using techniques known in the art (for example, 
50 see Eckstein, F., Ed., Oiigonucleotides and Analogues: A Practical Approach, IRS Press at Oxford University 
Press. New York, 1992); and Kricka, L.J., Ed., Nonisotopic DNA Prot)e Techniques, Academic Press, San Die- 
go, (1992)). 

In one method of treating Huntington's disease in a patient In need of such treatment, functional huntingtin 
DNA is provided to the cells of such patient, preferably prior to such symptomatic state that indicates the death 
55 of many of the patient's neuronal cells which it is desired to target with the method of the invention. The re- 
placement huntingtin DNA is provided in a manner and amount that permits the expression of the huntingtin 
protein provided by such gene, for a time and in a quantity sufficient to treat such patient Many vector systems 
are known in the art to provide such delivery to human patients In need of a gene or protein missing from the 
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cell. For example, adenovirus or retrovirus systems can be used, especially modified retrovirus systems and 
especially herpes simplex virus systems. Such methods are provided for. in. for example, the teachings of 

316 (1992). WO93/03743 and WO90/09441 each incorporated herein fully by reference. Methods <rf antisense 
strategies are known in the art (see. for example. Antisense Strategies. Baserga. R. et a/., Eds., Annals of the 
New York Academy of Sciences, volume 660. 1992). 

In another method of treating Huntington's disease in a patient in need of such treatment, a gene encoding 
an expressible sequence that transcribes huntingtin antisense RNA is provided to the cells of such patient 
preferably prior to such symptomatic state that indicates the death of many of the patient's neuronal cells which 
It IS desired to target with the method of the invention. The replacement huntingtin antisense RNA gene is pro- 
vided in a manner and amount that pemilts the expression of the antisense RNA provided by such gene for 
a time and in a quantity sufficient to treat such patient, and especially in an amount to inhibit translation of 
he aberrant huntingtin mRNA that is being expressed in the cells of such patient. As above, many vector sys- 
,r T *° P™''''^® *° patients in need of a gene or protein which is 

altered in the patients' cells. For example, adenovirus or retrovirus systems can be used, especially modified 
retrovirus systems and especially herpes simplex virus systems. Such methods are provided for, in for exam- 
ple, the teachings of Breakefield.X. A etal.. The /Vewe/o/og/sf 3:203-218(1991); Huang. Q.efa/ ExDerimental 
Neurology rt5:303-316 (1992). W093/a3743 and WO90/09441 each ini,rporated ^^iSnT^y^^te^^ 
Delivery of a DNA sequence encoding a functional huntingtin protein, such as the amino acid encoding 
sequence of Figure 4. will effectively replace the altered huntingtin gene of the invention, and inhibit, and/or 
stop and/or regress the symptoms that are the result of the interference to huntingtin gene expression due to 
an increased number of CAG repeats, such as 37 to 86 repeats in the huntingtin gene as compared to the 11- 
34 CAG repeats found in human populations not at risk for developing Huntington's disease 

Because Huntington's disease is characterized by a loss of neurons that is most severe in the caudate 
and putamen regions of the brain, the method of treatment of the invention is most effective when the replace- 
ment huntingtm gene is provided to the patient early in the course of the disease, prior to the loss of many 
neurons due to cell death. For that reason, presymptomatic screening methods according to the invention are 
important m identifying those individuals in need of treatment by the method of the inventton. and such treat- 
ment preferably is provided while such individual is presymptomatic. 

in a further method of treating Huntington's disease in a patient in need of such treatment such method 
provides an antagonist to the aben^nt huntingtin protein in the cells of such patient 

Although the method is specifically described for DNA-DNA probes, it is to be understood that RNA pos- 
sessing the same sequence informatfon as the DNA of the invention can be used when desired 

For di^nostic assays, huntingtin antibodies are useful for quantitating and evaluating levels of huntingtin 
protein, and are especially useful in immunoassays and diagnostic kits. 

In another embodiment, the present invention relates to an antibody having binding affinity to an huntingtin 
polypeptide or a binding fragment thereof. In a preferred embodiment, the polypeptide has the amino acid se- 
quence set forth in SEQ ID NO:6. or mutant or species variation thereof, or at least 7 contiguous amino acids 
thereof (preferably, at least 1 0. 15. 20, or 30 contiguous amino acids thereof). Those which bind selectively to 
hun ingtin would be chosen for use in methods which could include, but should not be limited to. the analysis 
Of altered huntingtin expression in tissue containing huntingtin. 

The antibodies of the present invention include monoclonal and polyclonal antibodies, as well fragments 
of these antibodies. Antibody fragments which contain the idiotype of the molecule can be generated by known 
techniques. For example, such fragments include but are not limited to: the FOb')^ fragment the Fab' frag- 
*s ments, and the Fab fragments. 

Of special interest to the present invention are antibodies to huntingtin (or their functtonal derivatives) 
which are produced in humans, orare Humanized- (i.e. non-immunogenic In a human) by recombinant or other 
technology Humanized antibodies may be produced, for example by replacing an immunogenic portion of an 
antioody with a corresponding, but non-immunogenic portion (i.e. chimeric antibodies) (Robinson R R. ef a/ 
International Patent Publication PCT/US86/02269; Akira. K. etal.. European Patent Application lW,187 Ta- 
niguchi. M., European Patent Application 171.496: Morrison. S.L. etal., European Patent Application 173 494- 
Neuberger. M.S. etal., PCT Application WO 86/01533; Cabilly, S. etal.. European Patent Application 125 023- 
MoS: M V^' 240:1041-1043 (1988); Liu. A.Y. ef a/., Proc. Natl. Acad. Sci. USA 84:3439-3443 

i;« yji^^ J u ' ^39:3521-3526 (1987); Sun, L.K. etal.. Pmc. Natl. Acad. Sci. USA 84:214- 

7.n«il l2' Y- et al.. Cane. Res. 47:999-1005 (1987); Wood. C.R. et al.. Nature 3f4 446-449 

1985)); Shaw et al.. J. Natl. Cancer Inst. 8ai553-1559 (1988). General reviews of 'humanized- chimeric an- 
?2°i47iQftR»T f Kr^K°'"'°"' 229:1202-1207 (1985)) and by Oi. V.T ef al.. BioTechniques 

4.214 (1986)). Suitable humanized- antibodies can be alternatively produced by CDR or CEA substitution 
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(Jones. P.T. etal., Nature 32t:552-525 (1986); Verhoeyan et ai, Science 239:1534 (1988); Beidler. C.B. et aL, 
J. Immunol t4t:4053-4060 (1988)). 

In another embodiment, the present invention relates to a hybridoma which produces the above-described 
monoclonal antibody, or binding fragment thereof. A hybridoma is an immortalized cell line which is capable 
5 of secreting a specific monoclonal antibody. 

In general, techniques for preparing monoclonal antibodies and hybridomas are well known in the art 
(Campbell, "Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology" 
Elsevier Science Publishers, Amsterdam, The Netherlands (1984); SL Groth etal., J, Immunol. Methods 35:1- 
21 (1980)). 

10 Any animal (mouse, rabbit, and the like) which is known to produce antibodies can be immunized with the 

selected polypeptide. Methods for immunization are well known in the art Such methods include subcutaneous 
or interperitoneal injection of the polypeptide. One skilled in the art will recognize that the amount of polypeptide 
used for immunization will vary based on the animal which is immunized, the antigenicity of the polypeptide 
and the site of injection. 

15 The polypeptide may be modified or administered in an adjuvant in order to increase the peptide antige- 

nicity. Methods of increasing the antigenicity of a polypeptide are well known in the art. Such procedures in- 
clude coupling the antigen with a heterologous protein (such as globulin or p-galactosidase) or through the 
inclusion of an adjuvant during immunization. 

For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma 
20 cells, and allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to identify the hybridoma cell which 
produces an antibody with the desired characteristics. These Include screening the hybridomas with an ELISA 
assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp.Cell Res. ^75:109-124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using 
25 procedures known in the art (Campbell. Monoclonal Antibody Technology: Laboratory Techniques in Biochem- 
istry and Molecular Biology, supra (1 984)). 

For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is 
screened for the presence of antibodies with the desired specificity using one of the above-described proce- 
dures. 

30 In another embodiment of the present Invention, the above-described antibodies are detectably labeled. 

Antibodies can be detectably labeled through the use of radioisotopes, affinity labels (such as biotin. avidin. 
and the like), enzymatic labels (such as horse radish peroxidase, alkaline phosphatase, and the like) fluores- 
cent labels (such as FITC or rhodamine, and the like), paramagnetic atoms, and the like. Procedures for ac- 
complishing such labeling are well-known In the art, for example, see (Sternberger et aL, J. Histochem. Cyto- 

35 chem.18:3^5 (^970)\ Bayer etaL, Meth. Enzym. 62:308 ^1979); Engval etal., Immunol. 109A29 (1972); Cod- 
ing, J. Immunol. Meth, f 3:215 (1976)). The labeled antibodies of the present invention can be used for /n vitro, 
in vivo, and in situ assays to Identify cells or tissues which express a specific peptide. 

In another embodiment of the present invention the above-described antibodies are immobilized on a solid 
support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such 

40 as agarose and sepharose. acrylic resins and such as polyacryl amide and latex beads. Techniques for coupling 
antibodies to such solid supports are well known in the art (Weir ef a/., "Handbook of Experimental Immunology" 
4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby et al., Meth. Enzym. 
34 Academic Press, N.Y. (1974)). The immobilized antibodies of the present invention can be used form vitro, 
in vivo, and in situ assays as well as in immunochromotography. 

45 Furthermore, one skilled in the art can readily adapt currently available procedures, as well as the tech- 

niques, methods and kits disclosed above with regard to antibodies, to generate peptides capable of binding 
to a specif ic peptide sequence In order to generate rationally designed antlpeptlde peptides, for example see 
Hurby et al., "Application of Synthetic Peptides: Antisense Peptides", In Synthetic Peptides, A User's Guide, 
W.H. Freeman, NY, pp. 289-307 (1992), and Kaspczak et aL, Biochemistry 28:9230-8 (1989). 

50 Antl-peptide peptides can be generated In one of two fashions. First the anti-peptide peptides can be gen- 

erated by replacing the basic amino acid residues found in the huntingtin peptide sequence with acidic residues, 
while maintaining hydrophobic and uncharged polar groups. For example, lysine, arginine, and/or histidlne re- 
sidues are replaced with aspartic acid or glutamic acid and glutamic acid residues are replaced by lysine, ar- 
ginine or histidine. 

55 The manner and method of carrying out the present invention can be more fully understood by those of 

skill by reference to the following examples, which examples are not intended in any manner to limit the scope 
of the present invention or of the claims directed thereto. 
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Examples 



The gene causing Huntington's disease has been mapped in 4p16.3 but has previously eluded identifica- 
1L .! haplotype analysis of linkage disequilibrium to spotlight a small segment of 4p16 3 

as the likely location of the defect. A new gene, huntingtin (ITIS), isolated using cloned "trapped" exons from 
a cosmid contig of the target area contains a polymorphic trinucleotide repeat that is expanded and unstable 
on HD chromosomes. A (CAG)„ repeat longer than the normal range of about 11 to about 34 copies was ob- 
served on HD chromosomes from all 75 disease families examined, comprising a wide range of ethnic back- 
grounds and 4p16.3 haplotypes. The (CAG)„ repeat, which varies from 37 to at least 86 copies on HD chro- 
mosomes appears to be located within the coding sequence of a predicted about 348 kDa protein that is widely 
expressed butunrelated to any known gene. Thus, the Huntington's disease mutation involves an unstable DNA 
;« described in fragile X syndrome and myotonic dystrophy, acting in the context of a 

novel 4p16.3 gene to produce a dominant phenotype. 

The following protocols and experimental details are referenced in the examples that follow 
HDCell Unes. Lymphoblast ceH lines from HD families of varied ethnic backgrounds used for genetic link- 
! oV^n, !!o^'! *" (Conneally ef a/., Genomics 5:304-308 (1989); MacDonald ei al.. Nature Genet 
7:99-103 (1992)) have been established (Anderson and Gusella. In Vitro 2a856-858 (1984)) in the Molecular 
Neurogenetics Unit, Massachusetts General Hospital, over the past 13 years. The Venezuelan HD pedigree 
IS an extended kindred of over 10.000 members in which all affected individuals have inherited the HD gene 

mZ,! wT°" , ^ ^^^""^ 306:234-238 (1983); Gusella ef al.. Science 225:1320-1326 

(1 984); Wexler ef al.. Nature 326: 1 94- 1 97 (1 987)). 

DNA^NA Blotting. DNA was prepared from cultured cells and DNA blots prepared and hybridized as de- 
■ ^'^^^ 76=5239-5243 (1979); Gusella ef al.. Nature 306:234-238 

« ioV/?Lo^ prepared and Northern blotting performed as described in Taylor ef a/.. Nature Genet 3:223- 
«5 (1992). 

Construction of Cosmid Contig. The initial construction of the cosmid contig was by chromosome walking 
/7-TftiT« .^oof^^^'; ^"^""^ ^ - 9:^04-i^2 (1991); Lin ef al.. Somat. Cell Mol. Genet. 

ivbrfd H^firfo .wJT employed, a collectton of Alu-positive cosmids from the reduced cell 

hybrid H39-8C10(Whaleyef a/., Som. Cell Mol. Genet. f7:83-91 (1991)) and an arrayed flow-sorted chromo- 
some 4 cosrnKJ library (NM87545) provided by the Los Alamos National Laboratory. Walking was accomplished 
u"!, '^^'"''^ "^'"9 suppression of repetitive and vector sequences, to robot-gener- 

ated high density filter grids (Nizetic. D. et el.. Proc. Natl. Acad. Sci. USA 88:3233-3237 (1991 )■ Lehrach H 
etal in Genome /Vna/ys/s; Genetic and Physical Mapping. Volume 1. Davies. K.E. etal.. Ed., Cold Spring Har- 
bor laboratory Press. 1991. pp. 39-81). Cosmids L1C2. L69F7, L228B6 and L83D3 were first identified by 
hybridization of YAC clone YGA2 to the same arrayed library (Bates ef al.. Nature Genet f:180-187 (1992) 
Baxendale et aL Nucleic Acids Res. 19:6651 (1991)). HD cosmid GUS72-2130 was isolated by standard 
screening of a GUS72 cosmid library using a single-copy probe. Cosmid overlaps were confirmed by a com- 
l)ination of done^to-clone and done-to-genomic hybridizations, single-copy probe hybridizations and restric- 
tion mapping. 



cDNA Isolation and Characterization. Exon probes were isolated and cloned as described (Buckler ef al 
S^mf hmT^k "'^'^ 88:4005-4009 (1991)). Exon probes and cDNAs were used to screen human Iambi 
^^Srl h'^ ' '^"^l'"'^^'^ adult frontal cortex, fetal brain, adenovirus transfomned retinal cell 

line RCA. and liver RNA. cDNAdones. PGR products and trapped exons were sequenced as described (Sang- 
t ^^-.T^: ^! 74:5463-5467 (1977)). Direct cosmid sequencing was performed as de- 

thT f "Tk <^^^2)). Database searched were performed usSit 

2f 5:4^^o"(19S)) ^^""^ ^ Biotechnology Information (Altschul ef al.. J. Mol. Biol. 

PCR Assay of the (CAG)„ Repeat Genomic primers (SEQ ID NO:3 and SEQ ID NO:4) flanking the (CAG)„ 
5' ATG AAG GCC TTC GAG TCC CTC AAG TCC TTC 3' 

and 

55 5' AAA CTC ACG GTC GOT GCA GCG GCT CCT CAG 3'. 

n?m«r"In""M ''ff i'rf ""^'^ ^ ^"'"""^ °' 25 Ml using 50 ng of genomic DNA. 5 of each 

pnmer. 10 mM Tns. pH 8.3, SmM KOI. 2mM MgCI^. 200 ^M dNTPs. 10% DMSO. 0.1 unit Perfectmatoh (Stra- 
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tagene), 2.5 \iC\ ^^p-dCTP (Amersham) and 1 .25 units Taq polymerase (Boehringer Mannheim). After heating 
to 94*C for 1.5 minutes, the reaction mix was cycled according to the following program: 40 X 
[r@94°C;r@60°C;2'@72**C]. 5 ^1 of each PGR reaction was diluted with an equal volume of 95 % formamide 
loading dye and heat denatured for 2 min. at 95*=*C. The products were resolved on 5 % denaturing polyacry- 
5 lamlde gels. The PGR product from this reaction using cosmid L1 91 F1 (CAGia) as template was 247 bp. Allele 
sizes were estimated relative to a DNA sequencing ladder, the PGR products from sequenced cosmids, and 
the invariant background bands often present on the gel. Estimates of allelic variation were obtained by typing 
unrelated individuals of largely Western European ancestry, and normal parents of affected HD individuals 
from various pedigrees. 

10 Typing ofHD and normal chromosonies in Examples 5-8. HD chromosomes were derived from sympto- 

matic individuals and "at risk" individuals known to be gene carriers by linkage marker analysis. All HD chro- 
mosomes were from members of well-characterized HD families of varied ethnic backgrounds used previously 
for genetic linkage and disequilibrium studies (MacDonald, M.E.. et aL, Nature Genet f :99-103 (1992); Gon- 
neally, P.M., et aL, Genomics 5:304-308 (1989)). Three of the 150 families used were large pedigrees, each 

15 descended from a single founder. The large Venezuelan HD pedigree is an extended kindred of over 13,000 
members from which we typed 75 HD chromosomes (Gusella, J.F.. et ai. Nature 306:234-238 f1983); Wexler, 
N.S., et aL, Nature 326:194-197 (1987)). Two other large families that have been described previously as Fam- 
ily Z and Family D, provided 25 and 35 HD chromosomes, respectively (Folstein, S.E., et a!., Science 229:776- 
779 C1985)). Normal chromosomes were taken from married-ins in the HD families and from unrelated normal 

20 individuals from non-HD families. The DNA tested for all individuals except four was prepared from lympho- 
blastoid cell lines or fresh blood (Gusella, J.F., et aL, Nature 306:234-238 (1983); Anderson and Gusella, /n 
Vitro 20:856-858 (1984)). In the exceptional cases, DNA was prepared from frozen cerebellum. No difference 
in the characteristics of the PGR products were observed between lymphoblastoid, fresh blood, or brain DNAs. 
For f ive members of the Venezuelan pedigree aged 24-30, we also prepared DNA by extracting pelleted sperm 

25 from semen samples. The length of the HD gene (GAG)n repeat for all DNAs was assessed using polymerase 
chain reaction amplification. 

Statistical analysis as set forth in Examples 5-8. Associations between repeat lengths and onset age were 
assessed by Pearson correlation coefficient and by multivariate regression to assess higher order associa- 
tions. Gomparisons of the distributions of repeat length for all HD chromosomes and those for individual fam- 

30 ilies were made by analysis of variance and t-test contrasts between groups. The 95 % confidence bands were 
computed around the regression line utilizing the general linear models procedure of SAS (SAS Institute Inc., 
SAS/STAT User's Guide, Version 6, Fourth Edition. Volume 2 (SAS Institute Inc.. Gary. N.G.. pp. 848, 1989)). 

Example i 

35 

Application ofExon Amplification to Obtain Trapped Cloned Exons' 

The HD candidate region defined by discrete recombination events in well-characterized families spans 
2.2 Mb between D4S10 and D4S98 as shown in Figure 1. The 500 kb segment between D4S180 and D4S182 

40 displays the strongest linkage disequilibrium with HO, with about 1/3 of disease chromosomes sharing a com- 
mon haplotype, anchored by multi-allele polymorphisms at D4Sf27and D4S95 (MacDonald etal.. Nature Gen- 
et t:99-103 (1992)). Sixty-four overlapping cosmids spanning about 480 kb from D4S180 to a location be- 
tween D4S95 and D4S182 have been isolated by a combination of information from YAG (Baxendale a/.. 
Nucleic Acids Res, t9:6651 (1991)) and cosmid probe hybridization to high density filtergrids of a chromosome 

45 4 specific library, as well as additional libraries covering this region. Sixteen of these cosmids providing the 
complete contig are shown in Figure 1. We have previously used exon amplification to identify ADDA the a- 
adducin locus, IT10G3, a novel putative transporter gene, and IT11, a novel G protein-coupled receptor kinase 
gene in the region distal to D4S127 (Figure 1). 

We have now applied the exon amplification technique to cosmids from the region of the contig proximal 

50 to D4S127. This procedure produces "trapped" exon clones, which can represent single exons, or multiple ex- 
ons spliced together and is an efficient method of obtaining probes for screening cDNA libraries. Individual cos- 
mids were processed, yielding 9 exon clones in the region from cosmids L134B9 to LI 81 BIO. 

Two non-overlapping cDNAs were initially isolated using exon probes. IT15A was obtained by screening 
a transformed adult retinal cell cDNA library with exon clone DL118F5-U. IT16A was isolated by screening an 

55 adult frontal cortex cDNA library with a pool of three exon clones, DL83D3-8, DL83D3-1 , and DL228B6-3. By 
Northern blot analysis, we discovered that IT15A and IT16A are in fact different portions of the same large 
approximately 10-11 kb transcript. Figure 2 shows an example of a Northern blot containing RNAfrom lym- 
phoblastoid cell lines representing a normal individual and 2 independent homozygotes for HD chromosomes 
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of different haplotypes. The same approximately 10-11 kb transcript was also detected in RNA from a variety 
Of human tissues (liver, spleen, kidney, muscle and various regions of adult brain). 

IT15A and IT16A were used to "walk" in a number of human tissue cDNA libraries in order to obtain the 
full-length transcnpL Figure 3 shows a representation of 5 cDNA clones which define the IT1 5 transcript, under 
a schematic of the composite sequence derived as described in the legend. Figure 3 also displays the locations 
on the composite sequence of the 9 trapped exon clones. 

The composite sequence of iT15. containing the entire predicted coding sequence, spans 10.366 bases 
mcludmg a tail of 18 As as shown In Figure 4. An open reading frame of 9,432 bases begins with a potential 
mitiator methionine codon at base 31 6, located in the context of an optimal translation initiation sequence An 
in-frame stop codon is located 240 bases upstream from this site. The protein product of IT15 is predicted to 
be a 348 kDa protein containing 3,144 amino acids. Although the first Met codon In the long open reading frame 
has been chosen as the probably initiator codon, we cannot exclude that translation does not actually begin 
at a more 3 Met codon, producing a smaller protein. 

IS ExamplB 2 

Polymorphic Variation of the (CAG)„Trinucleotide Repeat 

»^ ^^l'. ^' '""5 sequence contains 21 copies of the triplet CAG, encoding glutamine (Figure 5) 

When this sequence was compared with genomic sequences that are known to surround simple sequence re^ 
peats (SSRs) m 4p16.3, it was found that normal cosmid L191F1 had 18 copies of the triplet indicating that 
the (CAG)„ repeat is polymorphic (Figure 5). Primers from the genomic sequence flanking the repeat were 
chosen to establish a PCR assay for this variatton. In the normal populatton, this SSR polymorphism displays 
at least 17 discrete alleles (Table 1) ranging from about 11 to about 34 repeat units. Ninety-eight percent of 
the 173 normal chromosomes tested contained repeat lengths between 11 and 24 repeats. Two chromosomes 
were detected in the 25-30 repeat range and 2 normal chromosomes had 33 and 34 repeats respectively The 
overall heterozygosity on normal chromosome was 80%. Based on sequence analysis of three clones, it ap- 

r;!^'^ ^^^"^ °" <CAGU but the potential for variatton of the smaller downstream 

(CCG)7 which IS also included in the PCR product, is also present. 

ExamplB 3 

Instability of the Trinucleotide Repeat on HD chromosomes 
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Sequence analysis of cosmid GUS72-2130, derived from a chromosome with the major HD haplotype (see 
«f oi'i®''®^'®'' trinucleotide repeat, far greater than the lai^est normal allele (Figure 5). When 

the PCR assay was applied to HD chromosomes, a pattern strikingly different from the normal variation was 
observed. HD heterozygotes contained one discrete allelic product in the normal size range, and one PCR prod- 
uct of much larger size, suggesting that the (CAG)„ repeat on HD chromosomes is expanded relative to normal 
40 Chromosomes. 

Figure 6 shows the patterns observed when the PCR assay was performed on lymphoblast DNAfrom a 
selected nuclear family in a large Venezuelan HD kindred. In this family. DNA marker analysis has shown pre- 
viously that the HO chromosome was transmitted from the father (lane 2) to seven children (lanes 3 5 6 7 
TamV ^ aLo® '^^'^•^osomes present in this mating yielded a PCR product in the normal size 

range (AN1 AN2. AN3) that was inherited in a Mendelian fashion. The HD chromosome in the father yielded 
a diffuse, fuzzy -appearing PCR product slightly smaller than the 48 repeat product of the non-Venezuelan 
HD cosmid. Except for the DNA in lane 5 which did not PCR amplify and in lane 11 which displayed only a 
single normal allele, each of the affected children's DNAs yielded a fuzzy PCR productof a different size (AE) 
indicating instability of the HD chromosome (CAG)„ repeat. Lane 6 contained an HD- specific product slightly 
smaller than or equal to that of the father's DNA. Lanes 3. 7. 10 and 8. respectively, contained HD-specific 
PCR products of progressively larger size. The absence of an HD-specific PCR product In lane 11 suggested 
that tfiis child s DNA possessed a (CAG)„ repeat that was too long to amplify efficiently. This was verified by 
Southern blot analysis in which the expanded HD allele was easily detected and estimated to contain up to 
100 copies of the repeat Notably, this child had juvenile onset of HD at the very early age of 2 years The 
onset of HD in the father was in his early 40s. typical of most adult HD patients in this population. The onset 
ages of children rapresented by lanes 3, 7, 10 and 8 were 26, 25. 14 and 11 years, respectively, suggesting 
a rough correlation between age at onset of HD and the length of the (CAG)„ repeat on the HD chromosome 
In keeping with this trend, the offspring represented in lane 6 with the fewest repeats remained asymptomatic 
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when last examined at age of 30. 

Figure 7 shows PGR analysis for a second sibship from the Venezuelan pedigree in which both parents 
are HD heterozygotes carrying the same HD chromosome based on DNA marker studies. Several of the off- 
spring are HD homozygotes (lanes 6+7, 10+11, 13+14, 17+18, 23+24) as reported previously (Wexler et ai, 

5 Nature 326:194-197 (1987)). Each parent's DNA contained one allele in the normal range (AN1, AN2) which 
was transmitted In a Mendelian fashion. The HD-specif ic products (AE) from the DNA of both parents and chil- 
dren were all much larger than the nomnal allelic products and also showed extensive variation in mean size. 
A neurologic diagnosis for the offspring in this pedigree was not provided to maintain the blind status of Inves- 
tigators involved in the ongoing Venezuela HD project, although age of onset again appears to parallel repeat 

10 length. Paired samples under many of the individual symbols represent independent tymphoblast lines initiated 
at least one year apart. The variance between paired samples was not as great as between the different in- 
dividuals, suggesting that the major differences in size of the PGR products resulted from nneiotic transmission. 
Of special note Is the result obtained in lanes 13 and 14. This HD homozygote*s DNA yielded one PGR product 
larger and one smaller than the HD-specif ic PGR products of both parents. 

15 To date, we have tested 75 independent HD families, representing all different reported in MacDonald et 

a/.. Nature Genet. t:99-103 (1992)) and a wide range of ethnic backgrounds. In ail 75 cases, a PGR product 
larger than the normal size range was produced from the HD chromosome. The sizes of the HD-specif ic prod- 
ucts ranged from 42 repeat copies to more than 66 copies, with a few individuals failing to yield a product be- 
cause of the extreme length of the repeat. In these cases. Southern blot analysis revealed an increase In the 

20 length of an EcoRI fragment with the largest allele approximating 100 copies of the repeat. Figure 8 shows 
the variation detected in members of an American family of Irish ancestry in which the major HD haplotype is 
segregating. Gosmid GUS72-2130 was cloned from the HD homozygous individual whose DNA was amplified 
in lane 2. As was observed in the Venezuelan HD pedigree (Figures 6 and 7), which segregates the disorder 
with a different 4p16.3 haplotype, the H£)-specif ic PGR products for this family display considerable size va- 

25 nation. 

Example 4 

New Mutations to HD 

30 

The mutation rate in HD has been reported to be very low. To test whether the expansion of the (GAG)n 
repeat is the mechanism by which new HD mutations occur, two pedigrees with sporadic cases of HD have 
been examined in which intensive searching failed to reveal a family history of the disorder. In these cases, 
pedigree information sufficient to identify the same chromosomes in both the affected Individual and unaffec- 
35 tive relatives was gathered. Figures 9 and 10 show the results of PGR analysis of the (GAG)n repeat in these 
families. The chromosomes in each family were assigned an arbitrary number based on typing for a large num- 
ber of RFLP and SSR markers in 4p18.3 defining distinct haplotypes and the presumed HD chromosome is 
starred. 

In family #1. HD first appeared in individual 11-3 who transmitted the disorder to III-1 along with chromo- 
40 some 3*. This same chromosome was present in 11-2. an elderly unaffected individual. PGR analysis revealed 
that chromosome 3* from 11-2 produced a PGR product at the extreme high end of the normal range (about 36 
GAG copies). However, the (GAG)n repeat on the same chromosome in 11-3 and III-1 had undergone sequential 
expansions to about 44 and about 46 copies, respectively. A simitar result was obtained in Family #2, where 
the presumed HD mutant III-2 had a considerably expanded repeat relative to the same chromosome in 11-1 
45 and III-1 (about 49 vs. about 33 GAG copies). In both family #1 and family #2, the ultimate HD chromosome 
displays the marker haplotype characteristic of 1/3 of all HD chromosomes, suggesting that this haplotype may 
be predisposed to undergoing repeat expansion. 

Discussion 

50 

The discovery of an expanded, unstable trinucleotide repeat on HD chromosomes within the IT15 gene 
is the basis for utilizing this gene as the HD gene of the Invention. These results are consistent with the inter- 
pretation that HD constitutes the latest example of a mutational mechanism that may prove quite common in 
human genetic disease. Elongation of a trinucleotide repeat sequence has been implicated previously as the 
55 cause of three quite different human disorders, the fragile X syndrome, myotonic dystrophy and spino-bulbar 
muscular atrophy. The Initial observations of repeat expansion In HD Indicate that this phenomenon shares 
features In common with each of these disorders. 

in the fragile X syndrome, expression of a constellation of symptoms that includes mental retardation and 
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a fragile site at Xq27.3 is associated with expanston of a (CGG)„ repeat thought to be in the 5' untranslated 
region of the FMRf gene (Fuefa/., Ce// 67:1047-1058 (1991); Kremer ef a/., Sc/ence 252-1711-1714(1991)- 
Verkerk et al.. Cell 65:904-914 (1991 )). In nfiyotonic dystrophy, a dominant disorder involving muscle weakness 
with myotonia that typically present in early adulthood, the unstable trinucleotide repeat. (CTG)„. is located in 
the 3- untranslated region of the mysotonin protein kinase gene (Aslanidis et al. Nature 355:548-551 (1992)- 
Brook ef al.. Cell 66:799-808 (1992); Buxton et al. Nature 355:547-548 (1992); Fu et al. Science 255-1256- 
1259 (1992); Harley et al.. Lar^cet 339:1125-1128 (1992); Mahadevan et al. Science 255:1253-1255 (1992)) 
The unstable (CAG)„ repeat in HD may be within the coding sequence of the IT15 gene, a feature shared with 
spino-bulbar muscular atrophy, an X-linked recessive adult-onset disorder of the motor neurons caused by ex- 
pansion of a (CAG), repeat in the coding sequence of the androgen receptor gene (LaSpada ef al Nature 
352:77-79 (1991)). The repeat length in both the fragile X syndrome and myotonic dystrophy tends to increase 
in successive generations, sometimes quite dramatically. Occasionally, decreases in the average repeat length 
are observed (Fu etal. Science 255:1256-1259 (1992); Yu etal. Am. J. Hum. Genet. 50:968-980 (1992)- Bru- 
ner et al. N. Engl. J. Afecf.;476-480 f1993)). The HD trinucleotide repeat is also unstable, usually expanding 
when transmitted to the next generation, but contracting on occasion. In HD. as in the other disorders change 
in copy number occurs in the absence of recombination. Compared with the fragile X syndrome, myotonic dys- 
trophy, and HD. the Instability of the disease allele in spino-bulbar muscular atrophy is more limited and dra- 
matic expansions of repeat length have not been seen (Biancalana etal. Hum. Mol. Genet 7:255-258 (1992)). 

Expansion of the repeat length in myotonic dystrophy is associated with a particular chromosomal haplo^ 
type, suggesting the existence of a primordial predisposing mutation (Harley ef al, Am. J. Hum Genet 49-66- 
75 (1991); Harley ef al. Nature 355:545-546 (1992); Ashizawa, Lancet 338:642-643 (1991); and Epstein 

(1991) ). In the fragile X syndrome, there may be a limited number of ancestral mutations that predispose to 
increases in trinucleotide repeat number (Rtehards ef al. Nature Genet. f:257-260 (1992)- Oudet et al Am 
J. Hum. Genet. 52:297-304 (1993)). The linkage disequilibrium analysis used to identify IT15 indicates that 
there are several haplotypes assodated with HD, but that at least 1/3 of HD chromosomes are ancestrally re- 
lated (MacDonald et al. Nature Genet 1:99-103 (1992)). These data, combined with the reported low rate of 
new mutatton to HD (Harper. J. Med. Genet. 89:365-376 (1992)). suggest that expansion of the trinucleotide 
repeat may only occur on select chromosomes. The analysis of two families presented herein, in which new 
mutation was supposed to have occurred, is consistent with the view that there may be particular normal chro- 
mosomes that have the capacity to undergo expansion of the repeat into the HD range. In each of these fam- 
ilies, a chromosome with a (CAG)„ repeat length in the upper end of the normal range was segregating on a 
chromosome whose 4p16.3 haplotype matched the most common haplotype seen on HD chromosomes and 
the dinical appearance of HD In these two cases was associated with expansion of the trinudeotide repeat 

The recent application of haplotype analysis to explore the linkage disequilibrium on HD chromosomes 
pointed to a portion of a 2.2 Mb candidate region defined by the majority of recombination events described 
in HD pedigrees (MacDonald ef al. Nature Genet f:99-103 (1992)). Previously, the search for the gene was 
confounded by three matings in which the genetic inheritance pattern was inconsistent with the remainder of 
the family (MacDonald etal.. Neuron 3:183-190 (1989b): Prichard et al. Am. J. Hum. Genet 50:1218-1230 

(1992) ). These matings produced apparently affected HD individuals despite the inheritance of only normal 
alleles for markers throughout 4p16.3, effectively exduding inheritance of the HD chromosome present In the 
rest of the pedigree. Using PCR assay disdosed above, each of these families was tested and it was deter- 
mined that like other HD kindreds, an expanded allele segregates with HD in affected individuals of all three 
pedigrees. However, an expanded allele was not present in those specific individuals with the inconsistent 
4p16.3 genotypes. Instead, these individuals displayed the normal alleles expected based on analysis of other 
markers in 4p16.3. It is conceivable that these inconsistent individuals do not, in fact have HD, but some other 
disorder. Alternatively, they might represent genetic mosaics in which the HD allele Is more heavily represented 
and/or nrare expanded in brain tissue than in the lymphoblast DMA used for genotyping. 

The capacity to monitor directly the size of the trinudeotide repeat in individuals "at risk" for HD provides 
significant advantages over current methods, eliminating the need for complicated linkage analyses fadlitat- 
ing genetic counseling, and extending the applicability of presymptomatic and prenatal diagnosis to "at risk" 
indivkluals with no living affected relatives, however, it is of the utmost importance that the cun-ent interna- 
tionally accepted guidelines and counseling protocols for testing those "at risk" continue to be observed and 
that samples from unaffeded relatives should not be tested inadvertently or without full consent In the series 
of patients examined in this study, there is an apparent correlatton between repeat length and age of onset 
of the disease, reminiscent of that reported in myotonic dystrophy (Harley etal. Lancet 339 1125-1 128 (1992)- 
Tsilfidis et al. Nature Genet 7:192-195 (1992)). The largest HD trinudeotide repeat segments were found in 
juvenile onset cases, where there is a known preponderance of male transmission (Merrit etal. Excerpta Med- 
ica, Amsterdam, pp. 645-650 (1 969)). 
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The expression of fragile X syndrome is associated with direct inactivation of the FMR1 gene (Pierretti et 
a/., Ce// 66:81 7-822 (1991); DeBoulle etal., Nature Genet. 3:31-35 (1993)). The recessive inheritance pattern 
of spino-bulbar muscular atrophy suggests that in this disonJer, an inactive gene product is produced. In myo- 
tonic dystrophy, the manner in which repeat expansion leads to the dominant disease phenotype is unknown. 
There are numerous possibilities for the mechanism of pathogenesis of the expanded trinucleotide repeat in 
HD. Without intending to be held to this theory, nevertheless notice can be taken that since Wolf-Hirschhorn 
patients hemizygous for 4p1 6.3 do not display features of HD, and IT1 5 mRNA is present in HD homozygotes, 
the expanded trinucleotide repeat does not cause simple inactivation of the gene containing it. The observation 
that the phenotype of HD is completely dominant, since homozygotes for the disease allele do not differ clin- 
ically from heterozygotes, has suggested that HD results from a gain of function mutation, in which either the 
mRNA product or the protein product of the disease allele would have some new property, or be expressed 
inappropriately (Wexleref a/., A/afure 326: 194-1 97 (1987); Myers a/., Am. J. Hum, Genet 45:615-618(1989)). 
If the expanded trinucleotide repeat were translated, the consequences on the protein product would be dra- 
matic, increasing the length of the poiy-glutamine stretch near the N-terminus. It is possible, however, that de- 
spite the presence of an upstream Met codon, the normal translatlonal start occurs 3' to the (CAG)n repeat 
and there is no poly-glutamine stretch in the protein product. In this case, the repeat would be in the 5* un- 
translated region and might be expected to have its dominant effect at the mRNA level. The presence of an 
expanded repeat mightdirectly alter regulation, localization, stability or translatabillty of the mRNA containing 
it, and could indirectly affect its counterpart from the normal allele in HD heterozygotes. Other conceivable 
scenarios are that the presence of an expanded repeat might alter the effective translation start site for the 
HD transcript, thereby truncating the protein, or alter the transcription start site for the IT15 gene, disrupting 
control of mRNA expression. Finally, although the repeat is located within the IT15 transcript, the possibility 
that it leads to HD by virtue of an action on the expression of an adjacent gene cannot be excluded. 

Despite this final caveat, it is consistent with the above results and most likely that the trinucleotide repeat 
expansion causes HD by its effect, either at the mRNA or protein level, on the expression and/or structure of 
the protein product of the IT15 gene, which has been named huntingtin. Outside of the region of the triplet 
repeat, the iT15 DNA sequence detected no significant similarity to any previously reported gene in the Gen- 
Bank database. Except for the stretches of glutamlne and proline near the N-terminus. the amino acid sequence 
displayed no similarity to known proteins, providing no conspicuous clues to huntingtin's function. The poly- 
glutamine and poly-proline region near the N-terminus detect similarity with a large number of proteins which 
also contain long stretches of these amino acids. It is difficult to assess the significance of such similarities, 
although it Is notable that many of these are DNA binding proteins and that huntingtin does have a single leucine 
zipper motiff at residue 1 ,443. Huntingtin appears to be widely expressed, and yet cell death in HD is confined 
to specific neurons in particular regions of the brain. 
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Example 5 

Distribution of Trinucleotide Repeat Lengths on Normal and HD Chromosomes 

The number of copies of the HD triplet repeat has been examined in a total of 425 HD chromosomes from 
1 50 independent families and compared with the copy number of the HD triplet repeat of 545 normal chromo- 
somes. The results are displayed in Figure 11. Two non-overlapping distributions of repeat length were ob- 
served, wherein the upper end of the normal range and the lower end of the HD range were separated by 3 
repeat units. The normal chromosomes displayed 24 alleles producing PCR products ranging from 11 to 34 
repeat units, with a median of 19 units (mean 19.71, s.d. 3.21). The HD chromosomes yielded 54 discrete PCR 
products con-esponding to repeat lengths of 37 to 86 units, with a median of 45 units (mean 46.42 s d 6 68) 

Of the HD chromosomes. 134 and 161 were known to be maternally or paternally-derived, respectively 
To investigate whether the sex of the transmitting parent might influence the distribution of repeat lengths 
these two sets of chromosomes were plotted separately in Figure 12. The maternally-derived chromosomes 
displayed repeat lengths ranging from 37 to 73 units, with a median of 44 (mean 44.93. s.d 5 14) The pater- 
nally-derived chromosomes had 37 to 86 copies of the repeat unit with a median of 48 units (mean 49 14 s d 
8 27). However, a higher proportion of the paternally-derived HD chromosomes had repeat lengths greater than 
55 units (16% vs. 2%). suggesting the possibility of a differential effect of paternal versus maternal transmis- 
sion. 

The data set used excluded chromosomes from a few clinically diagnosed individuals who have previously 
been shown not to have inherited the HD chromosome by DNA marker linkage studies (MacDonald M E ef 

a/.. Weufon 3:183-190(1989); Pritchard. C.. efa/.,>\m. J. Hum. Genet 5^1218-1230(1992)). These individuals 
have repeat lengths well within the nonnal range. Their disease manifestations have not been explained and 
they may represent phenocopies of HD. Regardless of the mechanism involved, the occurrence at low frequen- 
cy of such individuals within known HD families must be considered if diagnostic conclusions are based solely 
on repeat length. ' 

The control data set also excludes a number of chromosomes from phenotypically nomial individuals who 
are related to spontaneous" cases of HD or "new mutations". Chromosomes from these individuals who are 
not clinically affected and have no family history of the disorder cannot be designated as HD. However these 
chromosomes cannot be classified as unambiguously nomial because they are essentially the same chromo- 
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some as that of an affected reiatlve, the diagnosed "spontaneous" HD proband, except with respect to repeat 
length. The lengths of repeat found on these ambiguous chromosomes (34-38 units) span the gap between 
the control and HD distributions, confounding a decision on the status of any individual with a repeat in the 
high normal to low HD range. 

5 

Example 6 

Instability of the Trinucleotide Repeat 

10 The data in Figure 1 1 combine repeat lengths from 1 50 different HD families representing many potentially 

Independent origins of the defect. To examine the variation in repeat lengths on sets of HD chromosomes known 
to descend from a common founder, the data from three large HD kindreds (Gusella, J.F., et al.. Nature 306:234- 
238 (1983); Wexler. N.S., eta!., Wafure 326: 194- 197 (1987); Folstein. S.E., etaL, Science 229:776-779 (1985)) 
with different 4p16.3 haplotypes (MacDonald, M.E., et aL, Nature Genet. t:99-103 (1992)). typed for 75, 25 

15 and 35 individuals, respectively, were separated. Despite the single origin of the founder HD chromosome with- 
in each pedigree, members of the separate pedigrees display a wide range of repeat lengths (Figure 1 3). This 
instability of the HD chromosome repeat is most prominent in members of a large Venezuelan HD kindred (pan- 
el A) In which the common HD ancestor has produced 10 generations of descendants, numbering over 1 3,000 
individuals. The distribution of repeat lengths in this sampling of the Venezuelan pedigree (median 46, mean 

20 48.26. s.d. 9.3) is not significantly d iff erent from that of the larger sample of HD chromosomes from all families. 
Panels B and C display results for two extended families in which HD was introduced more recently than in 
the Venezuelan kindred. These families have been reported to exhibit different age of onset distributions and 
varied phenotypic features of HD (Folstein, S.E.. etaL, Science 229:776-779 (1985)). Both revealed extensive 
repeat length variation, with a median of 41 and 49 repeat units, respectively. The distribution of repeat lengths 

25 in the members of the family in Panel B was significantly different from the distribution of all HD chromosome 
repeat lengths (p<0.0001), with a smaller mean of 42.04 repeat units (s.d. 2.82). The repeat distribution from 
HD chromosomes of Panel C was also significantly different from the total data set (p<0.004). but with a higher 
mean of 49.80(s.d. 5.86). 

30 Example 7 

Parental Source Effects on Repeat Length Variation 

For 62 HD chromosomes In Figure 11, the length of the trinucleotide repeat also could be examined on 
35 the corresponding parental HD chromosome. In 20 of 26 maternal transmissions, and in 31 of 37 paternal trans- 
missions, the repeat length was altered, indicating considerable instability. A similar phenomenon was not ob- 
served for normal chromosomes, where more than 500 meiotic transmissions revealed no changes in repeat 
length, although the very existence of such a large number of normal alleles suggests at least a low degree, 
of instability. 

40 Figure 14 shows the relationship between the repeat lengths on the HD chromosomes in the affected par- 

ent and corresponding progeny. For the 20 maternally-inherited chromosomes on which the repeat length was 
altered, 13 changes were increases in length and 7 were decreases. Both increases and decreases involved 
changes of less than 5 repeat units and the overall correlation between the mother's repeat length and that 
of her child was r=0.95 (p<0.0001 ). The average change In repeat length in the 25 maternal transmissions was 

45 an increase of 0.4 repeats. 

On paternally-derived chromosomes, the 31 transmissions In which the repeat length changes comprised 
26 length increases and 5 length decreases. Although the decreases in size were only slightly smaller than 
those observed on maternally-derived chromosomes, ranging from 1 to 3 repeat units, the increases were of- 
ten dramatically larger. Thus, the conrelation of the repeat length in the father with that of his offepring was 

50 only r=0.35 (p<0.04). The average change in the 37 paternal transmissions was an Increase of 9 repeat units. 
The maximum length increase observed through paternal transmission was 41 repeat units, a near doubling 
of the parental repeat. 

For both male and female transmissions, there was no correlation between the size of the parental repeat 
and either the magnitude or frequency of changes. 
55 To determine whether the variation in the length of the repeat observed through male transmission of HD 

chromosomes is reflected in the male germ cells, we amplified the repeat from sperm DNAand from DNAof 
the corresponding lymphoblast from 5 HD gene carriers. The results, shown in Figure 15, reveal striking dif- 
ferences between the lymphoblast and sperm DNA for the HD chromosome repeat, but not for the repeat on 
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the normal chromosome. All the sperm donors are members of the Venezuelan HD family and range In age 
M A t 1 and 2 are siblings with HD chromosome repeat lengths based on lymphoblast 

DNAof 45 and 52. respectively, individuals 3 and 4 are also siblings, with HD repeat lengths of 46 and 49 
respectively. Individual 5. from a different sibship than either of the other two pairs, has an HD repeat of 52 
copies In all 5 cases, the PGR amplification of sperm DNA and lymphoblast DNA yielded identical products 
from the nomnal chromosome. However. In comparison with lymphoblast DNA. the HD gene from spemi DNA 
yielded a diffuse array of products. In 3 of the 5 cases (2.4 and 5). the diffuse array spread to much larger 
allelic products than the corresponding lymphoblast product Subject 2 showed the greatest range of expan- 
sion, with the sperm DNA product extending to over 80 repeat units. Interestingly, the 3 individuals displaying 
the greatest variation have the longest repeats and are currently symptomatic. The other two donors have 
shorter repeat lengths in the HD range, and remain at risk at this time. 

... 7*!f ^"^'"^ difference in the high repeat length range (>55) between HD chromosomes transmitted from 
the father and those transmitted from the mother Indicated a potential parental source effect. When this was 
examined directly, the HD chromosome repeat length changed in about 85% of transmissions. Most changes 
involved a fluctuation of only a few repeat units, with larger increases occurring only in male transmissions. 
The greater size increases in male transmission appear to be caused by particular instability of the HD trinu- 
cleotide repeat during male gametogenesis. based on the amplification of the repeat from sperm DNA. 

Example 8 

Relationship between Repeat Length and Age of Onset 

Increased repeat length might correlate with a reduced age of onset of HD. Accordingly, age of onset data 
was determined for 234 of the individuals represented in Figure 11. Figure 16 displays the repeat lengths found 
on the HD and normal chromosomes of these individuals relative to their age of onset Indeed, age of onset 
is inversely correlated with the HD repeat length. A Pearson correlation coefficient of r=-.75 p<0 0001 was 
obtained assuming a linear relationship between age of onset and repeat length. When a polynomial function 
was used, a better fit was obtained (R2=0.61. F=121.45). suggesting a higher order association between age 
of onset and repeat length. 

There is considerable variation in the age of onset associated with any specific number of repeat units, 
particularly for trinucleotide repeats in the 37-52 unit zone (88% of HD chromosomes) wherB onset ranged 
froni 15 to 75 years. In this range, a linear relationship between age of onset and repeat length provided as 
good a fit as a higher order relationship. The 95 % confidence Interval surrounding the predicted regression 
line was estimated at ±18 years. In the 37 to 52 unit range, the association of repeat length to onset age is 
only half as strong as in the overall distribution (r=-0.40. p<.0001). indicating that much of the predictive power 
IS contributed by repeats longer than 52 units, in this increased range, onset is likely to be very young and 
consequently not relevant to most persons seeking testing. 

For the 178 cases in the 37-52 repeat unit range for which it was possible to subdivide the data set based 
on parental origin of the HD gene, multivariate regression analysis suggested a significant effect of parental 
ongin on age of onset (p<0.05) independent of repeat length in this range. HD gene carriers from maternal 
transmissions had an average age of onset two years later than those from paternal transmissions 

In both univariate and multivariate analyses, no association between age of onset and the repeat length 
on the normal chromosome was detected, either in the total data set, or when it was subdivided into chromo- 
somes of maternal or paternal origin. 

All publications mentioned hereinabove are hereby incorporated in their entirety by reference 

While the foregoing invention has been described In some detail for purposes of clarity and understanding 
It will be appreciated by one skilled in the art from a reading of this disclosure that various changes in forrti 
and detail can be made without departing from the true scope of the invention and appended claims 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: THE GENERAL HOSPITAL CORPORATION 
Fruit Street 

Boston, Massachusetts 02114 
United States of America 



(ii) TITLE OF INVENTION: Huntingtin DNA, Protein And Uses Thereof 

<iii) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONDENCE ADDRESS: 
15 (A) KILBURN & STRODE 

(B) 3 0 JOHN STREET 

(C) LONDON 

(D) GREAT BRITAIN 

(E) WCIN 2DD 



(v) COMPUTER READABLE FORM: 



(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 



25 (vi) CURRENT APPLICATION DATA: 

(A) 7th March 1994 



30 



35 



45 



50 



(vii) PRIOR APPLICATION DATA: 

<A) APPLICATION NUMBER: 08/085,000 
(B) FILING DATE: 01 JULY 1993 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/027,498 

(B) FILING DATE: , OS MARCH 1993 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
40 (D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
GGCGGGAGAC CGCCATGGCG 20 
(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 
(G) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
AATACGACTC ACTATAG 17 

55 
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(i) SEQUENCE CH?UIACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 3; 
ATGAAGGCCT TCGAGTCCCT CAAGTCCTTC 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 

20 AAACTCACGG TCGGTGCAGC GGCTCCTCAG 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 10366 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ix) FEATURE: 

(A) NAME /KEY: CDS 
30 (B) LOCATION: 316.. 9748 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

TTGCTGTGTG AGGCAGAACC TGCGGGGGCA GGGGCGGGCT GGTTCCCTGG CCAGCCATTG 6 0 

GCAGAGTCCG CAGGCTAGGG CTGTCAATCA TGCTGGCCGG CGTGGCCCCG CCTCCGCCGG 12 0 

CGCGGCCCCG CCTCCGCCGG CGCACGTCTG GGACGCAAGG CGCCGTGGGG GCTGCCGGGA 180 

CGGGTCCAAG ATGGACGGCC GCTCAGGTTC TGCTTTTACC TGCGGCCCAG AGCCCCATTC 24 0 
40 ATTGCCCCGG TGCTGAGCGG CGCCGCGAGT CGGCCCGAGG CCTCCGGGGA CTGCCGTGCC 



300 



GGGCGGGAGA CCGCC ATG GCG ACC CTG GAA AAG CTG ATG AAG GCC TTC GAG 351 
Met Ala Thr Leu Glu Lys Leu Met Lvs Ala Phe Glu 
1 5 " 10 

TCC CTC AAG TCC TTC GAG CAG GAG CAG CAG CAG CAG CAG CAG CAG CAG 3 99 

Ser Leu Lys Ser Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 
15 20 25 

CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CCG CCA CCG CCG 447 
Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Pro Pro Pro Pro 
30 35 40 

CCG CCG CCG CCG CCG CCT CCT CAG CTT CCT CAG CCG CCG CCG CAG GCA 4 95 

Pro Pro Pro Pro Pro Pro Pro Gin Leu Pro Gin Pro Pro Pro Gin Ala 
45 50 55 60 

CAG CCG CTG CTG CCT CAG CCG CAG CCG CCC CCG CCG CCG CCC CCG CCG 54 3 

Gin Pro Leu Leu Pro Gin Pro Gin Pro Pro Pro Pro Pro Pro Pro Pro 
65 70 75 

CCA CCC GGC CCG GCT GTG GCT GAG GAG CCG CTG CAC CGA CCA AAG AAA 591 
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Pro Pro Gly Pro Ala Val Ala Glu Glu Pro Leu His Arg Pro Lys Lys 
80 85 90 

GAA CTT TCA GCT ACC AAG AAA GAC CGT GTG AAT CAT TGT CTG ACA ATA 63 9 

Glu Leu Ser Ala Thr Lys Lys Asp Arg Val Asn His Cys Leu Thr lie 

95 100 105 

TGT GAA AAC ATA GTG GCA CAG TCT GTC AGA AAT TCT CCA GAA TTT CAG 687 
Cys Glu Asn lie Val Ala Gin Ser Val Arg Asn Ser Pro Glu Phe Gin 
110 115 120 

AAA CTT CTG GGC ATC GCT ATG GAA CTT TTT CTG CTG TGC AGT GAT GAC 73 5 

Lys Leu Leu Gly lie Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp 
125 130 135 140 

GCA GAG TCA GAT GTC AGG ATG GTG GCT GAC GAA TGC CTC AAC AAA GTT 783 
Ala Glu Ser Asp Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val 
145 150 155 

ATC AAA GCT TTG ATG GAT TCT AAT CTT CCA AGG TTA CAG CTC GAG CTC 831 
lie Lys Ala Leu Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu 
160 165 170 

20 TAT AAG GAA ATT AAA AAG AAT GGT GCC CCT CGG AGT TTG CGT GCT GCC 87 9 

Tyr Lys Glu lie Lys Lvs Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala 
175 * 180 185 

CTG TGG AGG TTT GCT GAG CTG GCT CAC CTG GTT CGG CCT CAG AAA TGC 927 

Leu Trp Arg Phe Ala Glu Leu Ala His Leu Val Arg Pro Gin Lys Cys 
190 195 200 

25 

AGG CCT TAC CTG GTG AAC CTT CTG CCG TGC CTG ACT CGA ACA AGC AAG 975 

Arg Pro Tyr Leu Val Asn Leu Leu Pro Cys Leu Thr Arg Thr Ser Lys 
205 210 215 220 

AGA CCC GAA GAA TCA GTC CAG GAG ACC TTG GCT GCA GCT GTT CCC AAA 10 23 

2Vrg Pro Glu Glu Ser Val Gin Glu Thr Leu Ala Ala Ala Val Pro Lys 

225 230 235 

ATT ATG GCT TCT TTT GGC AAT TTT GCA AAT GAC AAT GAA ATT AAG GTT 1071 
lie Met Ala Ser Phe Gly Asn Phe Ala Asn Asp Asn Glu lie Lys Val 
240 245 250 

35 TTG TTA AAG GCC TTC ATA GCG AAC CTG AAG TCA AGC TCC CCC ACC ATT 1119 

Leu Leu Lys Ala Phe lie Ala Asn Leu Lys Ser Ser Ser Pro Thr lie 
255 260 265 

CGG CGG ACA GCG GCT GGA TCA GCA GTG AGC ATC TGC CAG CAC TCA AGA 1167 
Arg Arg Thr Ala Ala Gly Ser Ala Val Ser lie Cys Gin His Ser Arg 
270 275 280 

AGG ACA CAA TAT TTC TAT AGT TGG CTA CTA AAT GTG CTC TTA GGC TTA 1215 
Prg Thr Gin Tyr Phe Tyr Ser Trp Leu Leu Asn Val Leu Leu Gly Leu 
285 290 295 300 

CTC GTT CCT GTC GAG GAT GAA CAC TCC ACT CTG CTG ATT CTT GGC GTG 1263 
45 Leu Val Pro Val Glu Asp Glu His Ser Thr Leu Leu lie Leu Gly Val 

305 310 315 

CTG CTC ACC CTG AGG TAT TTG GTG CCC TTG CTG CAG CAG CAG GTC AAG 1311 
Leu Leu Thr Leu Arg Tyr Leu Val Pro Leu Leu Gin Gin Gin Val Lys 
320 325 330 

50 GAC ACA AGC CTG AAA GGC AGC TTC GGA GTG ACA AGG AAA GAA ATG GAA 13 59 

Asp Thr Ser Leu Lys Gly Ser Phe Gly Val Thr Arg Lys Glu Met Glu 

335 340 345 

GTC TCT CCT TCT GCA GAG CAG CTT GTC CAG GTT TAT GAA CTG ACG TTA 14 07 

Val Ser Pro Ser Ala Glu Gin Leu Val Gin Val Tyr Glu Leu Thr Leu 
350 355 360 



55 



CAT CAT ACA CAG CAC CAA GAC CAC AAT GTT GTG ACC GGA GCC CTG GAG 1455 
His His Thr Gin His Gin Asp His Asn Val Val Thr Gly Ala Leu Glu 
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CTG TTG GAG CAG CTC TTC AGA ACG CCT CCA CCC GAG CTT CTG CAA ACC 
Leu Leu Gin Gin Leu Phe Arg Thr Pro Pro Pro Glu Leu Leu Gin Thr 
385 390 395 

CTG ACC GCA GTC GGG GGC ATT GGG CAG CTC ACC GCT GCT AAG GAG GAr 
Leu Thr Ala Val Gly Gly He Gly Gin Leu Thr A^a A^a Glu ctu 

405 4i0 

TCT GGT GGC CGA AGC CGT AGT GGG AGT ATT GTG GAA CTT ATA GCT GGA 
Ser Gly Gly Arg Ser Arg Ser Gly Ser He Val Gl^ Leu Ue A^a oty 

420 425 

GGG GGT TCC TCA TGC AGC CCT GTC CTT TCA AGA AAA CAA AAA GGC AAA 

15 ^ 430 "^^^ ^^"^ 4^5 ""^^ ^^"^ '^^'^ "^^^ ^ 

GTG CTC TTA GGA GAA GAA GAA GCC TTG GAG GAT GAC TCT GAA TCG AGA 
val Leu Leu Gly Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg 

455 460 

^tl ?Z^ c^^ o^"^ TCA GTG AAG GAT GAG ATC 

Ser Asp Val Ser Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu He 
465 470 475 

ser G?v ri?. a?^ I" I^"" ^^^^ TCA GCA 

ser Gly Glu Leu Ala Ala Ser Ser Gly Val Ser Thr Pro Gly S-r Ala 

485 490 

GGT CAT GAC ATC ATC ACA GAA CAG CCA CGG TCA CAG CAC ACA CTG CAG 
Gly His Asp He He Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin 
495 500 505 

GCG GAC TCA CTG GAT CTG GCC AGC TGT GAC TTG ACA AGC TCT GCC ACT 
Ala Asp Ser Leu Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr 

510 515 520 

GAT GGG GAT GAG GAG GAT ATC TTG AGC CAC AGC TCC AGC CAG GTC AGC 
Asp Gly Asp Glu Glu Asp He Leu Ser His Ser Ser Ser Gin Val Ser 

530 535 540 

GCC GTC CCA TCT GAC CCT GCC ATG GAC CTG AAT GAT GGG ACC CAG GCC 
Ala Val Pro Ser Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala 
545 550 555 

TCG TCG CCC ATC AGC GAC AGC TCC CAG ACC ACC ACC GAA GGG CCT GAT 
Ser Ser Pro lie Ser Asp Ser Ser Gin Thr Thr Thr Glu Gly Pro Asp 
40 560 565 570 

TCA GCT GTT ACC CCT TCA GAC AGT TCT GAA ATT GTG TTA GAC GGT ACC 
Ser Ala Val Thr Pro Ser Asp Ser Ser Glu He Val Leu Asp Gly Thr 
575 580 585 

TAT TTG GGC CTG CAG ATT GGA CAG CCC CAG GAT GAA GAT 2127 
45 Asp Asn Gin Tyr Leu Gly Leu Gin He Gly Gin Pro Gin Asp Glu Asp 
590 595 600 

GAG GAA GCC ACA GGT ATT CTT CCT GAT CAA GCC TCG CAG CCC TTC AGG 2175 
Glu Glu Ala Thr Gly He Leu Pro Asp Glu Ala Ser Glu Ala Phe Arg 
^05 610 615 620 

AAC TCT TCC ATG GCC CTT CAA CAG GCA CAT TTA TTG AAA AAC ATG AGT 2223 
Asn Ser Ser Met Ala Leu Gin Gin Ala His Leu Leu Lys Asn Met Ser 
625 630 635 

CAC TGC AGG CAG CCT TCT GAC AGC AGT GTT GAT AAA TTT GTG TTG AGA 2271 
His Cys Arg Gin Pro Ser Asp Ser Ser Val Asp Lys Phe Val Leu Arg 

^40 645 650 

GAT GAA GCT ACT GAA CCG GGT GAT CAA GAA AAC AAG CCT TGC CGC ATC 2319 
Asp Glu Ala Thr Glu Pro Gly Asp Gin Glu Asn Lys Pro Cys Arg He 
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655 660 665 

AAA GGT GAC ATT GGA CAG TCC ACT GAT GAT GAC TCT GCA CCT CTT GTC 23 67 

Lys Gly Asp lie Gly Gin Ser Thr Asp Asp Asp Ser Ala Pro Leu Val 

^ 670 675 680 

CAT TCT GTC CGC CTT TTA TCT OCT TCG TTT TTG CTA ACA GGG GGA AAA 2415 

His Ser Val Arg Leu Leu Ser Ala Ser Phe Leu Leu Thr Gly Gly Lys 
685 690 695 700 

10 AAT GTG CTG GTT CCG GAC AGG GAT GTG AGG GTC AGC GTG AAG GCC CTG 2463 

Asn Val Leu Val Pro Asp Arg Asp Val Arg Val Ser Val Lys Ala Leu 
705 710 715 



GCC CTC AGC TOT GTG GGA GCA OCT GTG GCC CTC CAC CCG GAA TCT TTC 2511 
Ala Leu Ser Cys Val Gly Ala Ala Val Ala Leu His Pro Glu Ser Phe 
720 725 730 

TTC AGC AAA CTC TAT AAA GTT CCT CTT GAC ACC ACG GAA TAC CCT GAG 2559 
Phe Ser Lys Leu Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pro Glu 
735 740 745 

GAA CAG TAT GTC TCA GAC ATC TTG AAC TAC ATC GAT CAT GGA GAC CCA 2607 
Glu Gin Tyr Val Ser Asp lie Leu Asn Tyr lie Asp His Gly Asp Pro 
750 755 760 

CAG GTT CGA GGA GCC ACT GCC ATT CTC TGT GGG ACC CTC ATC TGC TCC 2655 
Gin Val Arg Gly Ala Thr Ala lie Leu Cvs Gly Thr Leu lie Cys Ser 

V65 770 ' 775 780 

ATC CTC AGC AGG TCC CGC TTC CAC GTG GGA GAT TGG ATG GGC ACC ATT 2 7 03 

lie Leu Ser Arg Ser Arg Phe His Val Gly Asp Trp Met Gly Thr lie 
785 790 795 

AGA ACC CTC ACA GGA AAT ACA TTT TCT TTG GCG GAT TGC ATT CCT TTG 2751 
Arg Thr Leu Thr Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu 
800 805 810 

CTG CGG AAA ACA CTG AAG GAT GAG TCT TCT GTT ACT TGC AAG TTA GCT 2799 
Leu Arg Lys Thr Leu Ly^ Asp Glu Ser Ser Val Thr Cys Lys Leu Ala 
815 820 825 

35 TGT ACA GCT GTG AGG AAC TGT GTC ATG AGT CTC TGC AGC AGC AGC TAC 2 847 

Cys Thr Ala Val Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr 
830 835 840 

AGT GAG TTA GGA CTG CAG CTG ATC ATC GAT GTG CTG ACT CTG AGG AAC 28 95 

Ser Glu Leu Gly Leu Gin Leu He lie Asp Val Leu Thr Leu Arg Asn 
^ 845 850 855 860 

AGT TCC TAT TGG CTG GTG AGG ACA GAG CTT CTG GAA ACC CTT GCA GAG 2943 
Ser Ser Tyr Trp Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu 
865 870 875 

ATT GAC TTC AGG CTG GTG AGC TTT TTG GAG GCA AAA GCA GAA AAC TTA 29 91 

45 He Asp Phe Arg Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu 
880 885 890 

CAC AGA GGG GCT CAT CAT TAT ACA GGG CTT TTA AAA CTG CAA GAA CGA 3 03 9 

Kis Arg Gly Ala His His Tyr Thr Gly Leu Leu Lys Leu Gin Glu Arg 
895 900 905 



GTG CTC AAT AAT GTT GTC ATC CAT TTG CTT GGA GAT GAA GAC CCC AGG 3 0 87 

Val Leu Asn Asn Val Val He His Leu Leu Gly Asp Glu Asp Pro Arg 
910 915 . 920 



GTG CGA CAT GTT GCC GCA GCA TCA CTA ATT AGG CTT GTC CCA AAG CTG 3135 
Val Arg His Val Ala Ala Ala Ser Leu He Arg Leu Val Pro Lys Leu 
55 9 2 5 9 3 0 9 3 5 9 4 0 

TTT TAT AAA TGT GAC CAA GGA CAA GCT GAT CCA GTA GTG GCC GTG GCA 3183 
Phe Tyr Lys Cys Asp Gin Gly Gin Ala Asp Pro Val Val Ala Val Ala 
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545 950 955 

AGA GAT CAA AGC AGT GTT TAG CTG AAA CTT CTC ATG CAT GAG ACG CAG 3231 
Arg Asp Gin Ser Ser Val Tyr Leu Lys Leu Leu Met His Glu Thr Gin 
960 965 970 

CCT CCA TCT CAT TTC TCC GTC AGC ACA ATA ACC AGA ATA TAT AGA GGC 3 27 9 

Pro Pro Ser His Phe Ser Val Ser Thr He Thr Arg lie Tyr Arg Gly 
975 930 985 

TAT AAC CTA CTA CCA AGC ATA ACA GAC GTC ACT ATG GAA AAT AAC CTT 3 3 27 

Tyr Asn Leu Leu Pro Ser He Thr Asp Val Thr Met Glu Asn Asn Leu 
990 995 1000 

TCA AGA GTT ATT GCA OCA GTT TCT CAT GAA CTA ATC ACA TCA ACC ACC 3 3 75 

Ser Arg Val He Ala Ala Val Ser His Glu Leu He Thr Ser Thr Thr 
1005 1010 1015 1020 

AGA GCA CTC ACA TTT GGA TGC T3T GAA GCT TTG TGT CTT CTT TCC ACT 3423 
Arg Ala Leu Thr Phe Gly Cys Cys Glu Ala Leu Cys Leu Leu Ser Thr 
1025 1030 1035 

20 GCC TTC CCA GTT TGC ATT TGG AGT TTA GGT TGG CAC TGT GGA GTG CCT 3471 

Ala Phe Pro Val Cys He Trp Ser Leu Gly Trp His Cys Gly Val Pro 
1040 1045 1050 

CCA CTG AGT GCC TCA GAT GAG TCT AGG AAG AGC TGT ACC GTT GGG ATG 3519 
Pro Leu Ser Ala Ser Asp Glu Ser Arg Lys Ser Cys Thr Val Gly Met 
25 1055 1060 1065 

GCC ACA ATG ATT CTG ACC CTG CTC TCG TCA GCT TGG TTC CCA TTG GAT 3567 
Ala Thr Met He Leu Thr Leu Leu Ser Ser Ala Trp Phe Pro Leu Asp 
1070 1075 1080 

CTC TCA GCC CAT CAA GAT GCT TTG ATT TTG GCC GGA AAC TTG CTT GCA 3 615 

30 Leu Ser Ala His Gin Asp Ala Leu He Leu Ala Gly Asn Leu Leu Ala 
1085 1090 1095 1100 

GCC AGT GCT CCC AAA TCT CTG AGA AGT TCA TGG GCC TCT GAA GAA GAA 3 663 

Ala Ser Ala Pro Lys Ser Leu Arg Ser Ser Trp Ala Ser Glu Glu Glu 
1105 1110 1115 

^ GCC AAC CCA GCA GCC ACC AAG CAA GAG GAG GTC TGG CCA GCC CTG GGG 3 711 

Ala Asn Pro Ala Ala Thr Lys Gin Glu Glu Val Trp Pro Ala Leu Gly 
1120 1125 1130 

GAC CGG GCC CTG GTG CCC ATG GTG GAG CAG CTC TTC TCT CAC CTG CTG 3 759 

Asp Arg Ala Leu Val Pro Met Val Glu Gin Leu Phe Ser His Leu Leu 
40 1135 1140 1145 

AAG GTG ATT AAC ATT TGT GCC CAC GTC CTG GAT GAC GTG GCT CCT GGA 3 807 

Lys Val He Asn He Cys Ala His Val Leu Asp Asp Val Ala Pro Gly 
1150 1155 1160 

CCC GCA ATA AAG GCA GCC TTG CCT TCT CTA ACA AAC CCC CCT TCT CTA 3 855 

45 Pro Ala He Lys Ala Ala Leu Pro Ser Leu Thr Asn Pro Pro Ser Leu 
1165 1170 1175 1180 

AGT CCC ATC CGA CGA AAG GGG AAG GAG AAA GAA CCA GGA GAA CAA GCA 3 903 

Ser Pro He Arg Arg Lys Gly Lys Glu Lys Glu Pro Gly Glu Gin Ala 
1185 1190 1195 

50 

TCT GTA CCG TTG AGT CCC AAG AAA GGC AGT GAG GCC AGT GCA GCT TCT 3 951 

Ser Val Pro Leu Ser Pro Lys Lys Gly Ser Glu Ala Ser Ala Ala Ser 
1200 1205 1210 

AGA CAA TCT GAT ACC TCA GGT CCT GTT ACA ACA AGT AAA TCC TCA TCA 3 99 9 

Arg Gin Ser Asp Thr Ser Gly Pro Val Thr Thr Ser Lys Ser Ser Ser 
55 1215 1220 1225 

CTG GGG AGT TTC TAT CAT CTT CCT TCA TAC CTC AGA CTG CAT GAT GTC 4 047 

Leu Gly Ser Phe Tyr His Leu Pro Ser Tyr Leu Arg Leu His Asp Val 



26 



JDOCrD: <EP 0614977A2_L> 



EP 0 614 977 A2 



1230 1235 1240 

CTG AAA GCT ACA CAC GCT AAC TAC AAG GTC ACG CTG GAT CTT CAG AAC 
Leu Lys Ala Thr His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn 
1245 1250 1255 1260 



20 



25 



AGC ACG GAA AAG TTT GGA GGG TTT CTC CGC TCA GCC TTG GAT GTT CTT 
Ser Thr Glu Lys Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu 
1265 1270 1275 

10 TCT CAG ATA CTA GAG CTG GCC ACA CTG CAG GAC ATT GGG AAG TGT GTT 
Ser Gin lie Leu Glu Leu Ala Thr Leu Gin Asp lie Gly Lys Cys Val 
1280 1285 1290 

GAA GAG ATC CTA GGA TAC CTG AAA TCC TGC TTT AGT CGA GAA CCA ATG 
Glu Glu lie Leu Gly Tyr Leu Lys Ser Cys Phe Ser Arg Glu Pro Met 
^5 1295 1300 1305 

ATG GCA ACT GTT TGT GTT CAA CAA TTG TTG AAG ACT CTC TTT GGC ACA 
Met Ala Thr Val Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr 
1310 1315 1320 

AAC TTG GCC TCC CAG TTT GAT GGC TTA TCT TCC AAC CCC AGC AAG TCA 
Asn Leu Ala Ser Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser 
1325 1330 1335 1340 

CAA GGC CGA GCA CAG CGC CTT GGC TCC TCC AGT GTG AGG CCA GGC TTG 
Gin Gly Arg Ala Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu 
1345 1350 1355 

TAC CAC TAC TGC TTC ATG GCC CCG TAC ACC CAC TTC ACC CAG GCC CTC 
Tyr His Tyr Cys Phe Met Ala Pro Tyr Thr His Phe Thr Gin Ala Leu 
1360 1365 1370 

GCT GAC GCC AGC CTG AGG AAC ATG GTG CAG GCG GAG CAG GAG AAC GAC 
Ala Asp Ala Ser Leu Arg Asn Met Val Gin Ala Glu Gin Glu Asn Asp 
^ 1375 1380 1385 

ACC TCG GGA TGG TTT GAT GTC CTC CAG AAA GTG TCT ACC CAG TTG AAG 
Thr Ser Gly Trp Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys 
1390 1395 1400 

35 ACA AAC CTC ACG AGT GTC ACA AAG AAC CGT GCA GAT AAG AAT GCT ATT 
Thr Asn Leu Thr Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala lie 
1405 1410 1415 1420 

CAT AAT CAC ATT CGT TTG TTT GAA CCT CTT GTT ATA AAA GCT TTA AAA 
His Asn His lie Arg Leu Phe Glu Pro Leu Val lie Lys Ala Leu Lys 
^ 1425 1430 1435 

CAG TAC ACG ACT ACA ACA TGT GTG CAG TTA CAG AAG CAG GTT TTA GAT 
Gin Tyr Thr Thr Thr Thr Cys Val Gin Leu Gin Lys Gin Val Leu Asp 
1440 1445 1450 

TTG CTG GCG CAG CTG GTT CAG TTA CGG GTT AAT TAC TGT CTT CTG GAT 
45 Leu Leu Ala Gin Leu Val Gin Leu Arg Val Asn Tyr Cys Leu Leu Asp 
1455 1460 1465 

TCA GAT CAG GTG TTT ATT GGC TTT GTA TTG AAA CAG TTT GAA TAC ATT 
Ser Asp Gin Val Phe He Gly Phe Val Leu Lys Gin Phe Glu Tyr He 
1470 1475 1480 

^ GAA GTG GGC CAG TTC AGG GAA TCA GAG GCA ATC ATT CCA AAC ATC TTT 

Glu Val Gly Gin Phe Arg Glu Ser Glu Ala He He Pro Asn He Phe 
1485 1490 1495 1500 

TTC TTC TTG GTA TTA CTA TCT TAT GAA CGC TAT CAT TCA AAA CAG ATC 
Phe Phe Leu Val Leu Leu Ser Tyr Glu Arg Tyr His. Ser Lys Gin He 
55 1505 1510 1515 

ATT GGA ATT CCT AAA ATC ATT CAG CTC TGT GAT GGC ATC ATG GCC AGT 
He Gly He Pro Lys He He Gin Leu Cys Asp Gly He Met Ala Ser 
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1520 1525 



1530 



GGA AGG AAG GCT GTG ACA CAT GCC ATA CCG GCT CTG GAG CCC ATA GTC 4 95 9 

Gly .^g Lys Ala Val Thr His Ala He Pro Ala Leu Gin Pro lie Val 
1535 1540 -L545 



10 



15 



20 



30 



35 



45 



50 



CAC GAC CTC TTT GTA TTA AGA GGA ACA AAT AAA GCT GAT GCA GGA AAA 
A3P Leu Phe val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys 
1550 1555 ^ ^ 



1560 



GAG CTT GAA ACC CAA AAA GAG GTG GTG GTG TCA ATG TTA CTG AGA CTC 
Glu Leu Glu THr Gin Lys Glu Val Val Val Ser Met Leu Leu Leu 

1570 1575 ^ 1580 

ATC GAG TAC CAT CAG GTG TTG GAG ATG TTC ATT CTT GTC CTG CAG CAG 
He Gin Tyr His Gin Val Leu Glu Met Phe He Leu Val Leu Gin Gin 
1585 1590 1595 

TGC CAC AAG GAG AAT GAA GAC AAG TGG AAG CGA CTG TCT CGA CAG ATA 
Cys His Lys Glu Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin He 
1€00 1605 1610 

GCT GAC ATC ATC CTC CCA ATG TTA GCC AAA CAG CAG ATG CAC ATT GAC 
Ala Asp He He Leu Pro Met Leu Ala Lys Gin Gin Met His He Asp 
1^15 1620 1625 

TCT CAT GAA GCC CTT GGA GTG TTA AAT ACA TTA TTT GAG ATT TTG GCC 
Ser Has Glu Ala Leu Gly Val Leu Asn Thr Leu Phe Glu He Leu Ala 
25 ^^30 1635 1640 

CCT TCC TCC CTC CGT CCG GTA GAC ATG CTT TTA CGG AGT ATG TTC GTC 
f Val Asp Met Leu Leu Arg Ser Met Phe Val 

^^^^ 1650 1655 1660 

ACT CCA AAC ACA ATG GCG TCC GTG AGC ACT GTT CAA CTG TGG ATA TCG 
Thr Pro Asn Thr Met Ala Ser Val Ser Thr Val Gin Leu Trp He Ser 

1S65 1670 1675 



TCC TGT ACA GTA ATT AAT AGG TTA AGA GAT GGG GAC AGT ACT TCA ACG 

Ser Cys Thr Val He Asn Arg Leu Arg Aso Gly Asp Ser Thr Ser i^hr 
40 ^"^^0 1715 • 1720 

CTA GAA GAA CAC AGT GAA GGG AAA CAA ATA AAG AAT TTG CCA GAA GAA 

Leu Glu Glu His Ser Glu Gly Lys Gin He Lys Asn Leu Pro Glu Glu 
^"725 1730 1735 1740 



5007 



5055 



5103 



5151 



5199 



5247 



5295 



5343 



5391 



GGA ATT CTG GCC ATT TTG AGG GTT CTG ATT TCC CAG TCA ACT GAA GAT 
Gly He Leu Ala He Leu Arg Val Leu He Ser Gin Ser Thr Glu Asp 
1680 1685 1690 

ATT GTT CTT TCT CGT ATT CAG GAG CTC TCC TTC TCT CCG TAT TTA ATC 543 9 

He Val Leu Ser Arg He Gin Glu Leu Ser Phe Ser Pro Tyr Leu He 
1695 1700 -!705 



5487 



5535 



ACA TTT TCA AGG TTT CTA TTA CAA CTG GTT GGT ATT CTT TTA GAA GAC 5583 
Thr Phe Ser Arg Phe Leu Leu Gin Leu Val Gly He Leu Leu Glu Asp 
1745 1750 1755 

ATT GTT ACA AAA CAG CTG AAG GTG GAA ATG AGT GAG CAG CAA CAT ACT 5631 
He Val Thr Lys Gin Leu Lys Val Glu Met Ser Glu Gin Gin His Thr 
1760 1765 1770 

TTC TAT TGC CAG GAA CTA GGC ACA CTG CTA ATG TGT CTG ATC CAC ATC '^67 9 

Phe Tyr Cys Gin Glu Leu Gly Thr Leu Leu Met Cys Leu He His He 
1775 1780 1785 



TTC AAG TCT GGA ATG TTC CGG AGA ATC ACA GCA GCT GCC ACT AGG CTG 572 7 

Phe Lys Ser Gly Met Phe Arg Arg He Thr Ala Ala Ala Thr Arg Leu 
1790 1795 1800 

TTC CGC AGT GAT GGC TGT GGC GGC AGT TTC TAC ACC CTG GAC AGC TTG 5775 
Phe Arg Ser Asp Gly Cys Gly Gly Ser Phe Tyr Thr Leu Asp Ser Leu 
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1805 1810 1815 1820 

AAC TTG CGG OCT CGT TCC ATG ATC ACC ACC CAC CCG GCC CTG GTG CTG 
Asn Leu Arg Ala Arg Ser Met lie Thr Thr His Pro Ala Leu Val Leu 
1825 1830 1835 



10 



20 



25 



30 



CTC TGG TGT CAG ATA CTG CTG CTT GTC AAC CAC ACC GAC TAG CGC TGG 
Leu Trp Cys Gin lie Leu Leu Leu Val Asn His Thr Asp Tyr Arg Tzrp 
1840 1845 1850 

TGG GCA GAA GTG CAG CAG ACC CCG AAA AGA CAC AGT CTG TCC AGC ACA 
Trp Ala Glu Val Gin Gin Thr Pro Lys Arg His Ser Leu Ser Ser Thr 
1855 I860 1865 



AAG TTA CTT AGT CCC CAG ATG TCT GGA GAA GAG GAG GAT TCT GAC TTG 

Lys Leu Leu Ser Pro Gin Met Ser Gly Glu Glu Glu Asp Ser Asp Leu 
15 1870 1875 1880 

GCA GCC AAA CTT GGA ATG TGC AAT AGA GAA ATA GTA CGA AGA GGG GCT 

Ala Ala Lys Leu Gly Met Cys Asn Arg Glu lie Val Arg Arg Gly Ala 
1885 1890 1895 1900 



CTC ATT CTC TTC TGT GAT TAT GTC TGT CAG AAC CTC CAT GAC TCC GAG 
Leu lie Leu Phe Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu 
1905 1910 1915 

CAC TTA ACG TGG CTC ATT GTA AAT CAC ATT CAA GAT CTG ATC AGC CTT 
His Leu Thr Trp Leu lie Val Asn His lie Gin Asp Leu lie Ser Leu 
1920 1925 1930 

TCC CAC GAG CCT CCA GTA CAG GAC TTC ATC AGT GCC GTT CAT CGG AAC 
Ser His Glu Pro Pro Val Gin Asp Phe lie Ser Ala Val His Arg Asn 
1935 1940 1945 

TCT GCT GCC AGC GGC CTG TTC ATC CAG GCA ATT CAG TCT CGT TGT GAA 
Ser Ala Ala Ser Gly Leu Phe lie Gin Ala He Gin Ser Arg Cys Glu 
1950 1955 1960 

AAC CTT TCA ACT CCA ACC ATG CTG AAG AAA ACT CTT CAG TGC TTG GAG 
Asn Leu Ser Thr Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu 
1965 1970 1975 1980 

55 GGG ATC CAT CTC AGC CAG TCG GGA GCT GTG CTC ACG CTG TAT GTG GAC 

Gly He His Leu Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp 
1985 1990 1995 

AGG CTT CTG TGC ACC CCT TTC CGT GTG CTG GCT CGC ATG GTC GAC ATC 
Arg Leu Leu Cys Thr Pro Phe Arg Val Leu Ala Arg Met Val Asp He 
40 2000 2005 2010 

CTT GCT TGT CGC CGG GTA GAA ATG CTT CTG GCT GCA AAT TTA CAG AGC 
Leu Ala Cys Arg Arg Val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser 
2015 2020 2025 

AGC ATG GCC CAG TTG CCA ATG GAA GAA CTC AAC AGA ATC CAG GAA TAG 
^ Ser Met Ala Gin Leu Pro Met Glu Glu Leu Asn Arg lie Gin Glu Tyr 
2030 2035 2040 

CTT CAG AGC AGC GGG CTC GCT CAG AGA CAC CAA AGG CTC TAT TCC CTG 
Leu Gin Ser Ser Gly Leu Ala Gin Arg His Gin Arg Leu Tyr Ser Leu 
2045 2050 2055 2060 

50 

CTG GAC AGG TTT CGT CTC TCC ACC ATG CAA GAC TCA CTT AGT CCC TCT 
Leu Asp Arg Phe Arg Leu Ser Thr Met Gin Asp Ser Leu Ser Pro Ser 
2065 2070 2075 

CCT CCA GTC TCT TCC CAC CCG CTG GAC GGG GAT GGG CAC GTG TCA CTG 
Pro Pro Val Ser Ser His Pro Leu Asp Gly Asp Gly His Val Ser Leu 
55 2080 2085 2090 

GAA ACA GTG AGT CCG GAC AAA GAC TGG TAC GTT CAT CTT GTC AAA TCC 
Glu Thr Val Ser Pro Asp Lys Asp Trp Tyr Val His Leu Val Lys Ser 
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2095 2100 2105 



CAG TGT TGG ACC AGG TCA GAT TCT GCA CTG CTG GAA GGT GCA GAG CTG 
Gin Cys Trp Thr Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu 
2110 2115 2120 



S687 



GTG AAT CGG ATT CCT GCT GAA GAT ATG AAT GCC TTC ATG ATG AAC TCG 673 5 

Val Asn Arg He Pro Ala Glu Asp Met Asn Ala Phe Met Met Asn Ser 
2125 2130 2135 2140 

GAG TTC AAC CTA AGO CTG CTA GCT CCA T3C TTA AGC CTA GGG ATG AGT 6783 
Glu Phe Asn Leu Ser Leu Leu Ala Pro Cys Leu Ser Leu Gly Met Ser 
2145 2150 2155 

GAA ATT TCT GGT GGC CAG AAG AGT GCC CTT TTT GAA GCA GCC CGT GAG 6831 
Glu He Ser Gly Gly Gin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu 
15 2160 2165 2170 

GTG ACT CTG GCC CGT GTG AGC GGC ACC GTG CAG CAG CTC CCT GCT GTC 6879 
Val Thr Leu Ala Arg Val Ser Gly Thr Val Gin Gin Leu Pro Ala Val 
2175 2180 2185 

CAT CAT GTC TTC CAG CCC GAG CTG CCT GCA GAG CCG GCG GCC TAC TGG 6927 
Kxs His Val Phe Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Tro 
2190 2195 2200 

AGC AAG TTG AAT GAT CTG TTT GGG GAT GCT GCA CTG TAT CAG TCC CTG 6975 
Ser Lys Leu Asn Asp Leu Phe Gly Asp Ala Ala Leu Tyr Gin Ser Leu 

2205 2210 2215 2220 

CCC ACT CTG GCC CGG GCC CTG GCA CAG TAC CTG GTG GTG GTC TCC AAA 7023 
Pro Thr Leu Ala Arg Ala Leu Ala Gin Tyr Leu Val Val Val Ser Lys 
2225 2230 2235 

CTG CCC AGT CAT TTG CAC CTT CCT CCT GAG AAA GAG AAG GAC ATT GTG 7071 
30 Pro Glu Lys Glu Lys Asp He Val 

2240 2245 2250 

AAA TTC GTG GTG GCA ACC CTT GAG GCC CTG TCC TGG CAT TTG ATC CAT 7119 
Lys Phe Val Val Ala Thr Leu Glu Ala Leu Ser Trp His Leu He His 
2255 2260 2265 

35 GAG CAG ATC CCG CTG AGT CTG GAT CTC CAG GCA GGG CTG GAC TGC TGC 7167 

Glu Gin He Pro Leu Ser Leu Asp Leu Gin Ala Gly Leu Asp Cys Cys 
2270 2275 2280 

TGC CTG GCC CTG CAG CTG CCT GGC CTC TGG AGC GTG GTC TCC TCC ACA 7215 
Cys Leu Ala Leu Gin Leu Pro Gly Leu Trp Ser Val Val Ser Ser Thr 
40 2285 2290 2295 2300 

GAG TTT GTG ACC CAC GCC TGC TCC CTC ATC TAC TGT GTG CAC TTC ATC 7263 
Glu Phe Val Thr His Ala Cys Ser Leu He Tyr Cys Val His Phe He 
2305 2310 2315 

CTG GAG GCC GTT GCA GTG CAG CCT GGA GAG CAG CTT CTT AGT CCA GAA 7311 
45 Leu Glu Ala Val Ala Val Gin Pro Gly Glu Gin Leu Leu Ser Pro Glu 

2320 2325 2330 

AGA AGG ACA AAT ACC CCA AAA GCC ATC AGC GAG GAG GAG GAG GAA GTA 73 59 

Arg Arg Thr Asn Thr Pro Lys Ala He Ser Glu Glu Glu Glu Glu Val 
2335 2340 2345 



GAT CCA AAC ACA CAG AAT CCT AAG TAT ATC ACT GCA GCC TGT GAG ATG 74 07 

Asp Pro Asn Thr Gin Asn Pro Lys Tyr He Thr Ala Ala Cys Glu Met 

2350 2355 2360 



GTG GCA GAA ATG GTG GAG TCT CTG CAG TCG GTG TTG GCC TTG GGT CAT 74 55 

Val Ala Glu Met Val Glu Ser Leu Gin Ser Val Leu Ala Leu Gly His 
55 2365 2370 2375 2380 

AAA AGG AAT AGC GGC GTG CCG GCG TTT CTC ACG CCA TTG CTC AGG AAC 7 503 

Lys Arg Asn Ser Gly Val Pro Ala Phe Leu Thr Pro Leu Leu Arg Asn 

30 
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2385 2390 2395 

ATC ATC ATC AGC CTG GCC CGC CTG CCC CTT GTC AAC AGC TAG ACA CGT 
He He He Ser Leu Ala Arg Leu Pro Leu Val Asn Ser Tyr Thr Arg 
2400 2405 2410 



10 



GTG CCC CCA CTG GTG TGG AAG CTT GGA TGG TCA CCC AAA CCG GGA GGG 
Val Pro Pro Leu Val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly 
2415 2420 2425 

GAT TTT GGC ACA GGA TTC CCT GAG ATC CCC GTG GAG TTC CTC CAG GAA 
Asp Phe Gly Thr Ala Phe Pro Glu He Pro Val Glu Phe Leu Gin Glu 
2430 2435 2440 

AAG GAA GTC TTT AAG GAG TTC ATC TAC CGC ATC AAC ACA CTA GGC TGG 
Lys Glu Val Phe Lys Glu Phe He Tyr Arg He Asn Thr Leu Gly Trp 
^5 2445 2450 2455 2460 

ACC AGT CGT ACT CAG TTT GAA GAA ACT TGG GCC ACC CTC CTT GGT GTC 

Thr Ser Arg Thr Gin Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly val 
2465 2470 2475 



20 



25 



30 



CTG GTG ACC CAG CCC CTC GTG ATG GAG CAG GAG GAG AGC CCA CCA GAA 
Leu Val Thr Gin Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu 
2480 2485 2490 

GAA GAC ACA GAG AGO ACC CAG ATC AAC GTC CTG GCC GTG CAG GCC ATC 
Glu Asp Thr Glu Arg Thr Gin He Asn Val Leu Ala Val Gin Ala He 
2495 250O 2505 

ACC TCA CTG GTG CTC AGT GCA ATG ACT GTG CCT GTG GCC GGC AAC CCA 
Thr Ser Leu Val Leu Ser Ala Met Thr Val Pro Val Ala Gly Asn Pro 
2510 2515 2520 

GCT GTA AGC TGC TTG GAG CAG CAG CCC CGG AAC AAG CCT CTG AAA GCT 
Ala Val Ser Cys Leu Glu Gin Gin Pro Arg Asn Lys Pro Leu Lys Ala 
2525 2530 2535 2540 

CTC GAC ACC AGG TTT GGG AGG AAG CTG AGC ATT ATC AGA GGG ATT GTG 
Leu Asp Thr Arg Phe Gly Arg Lys Leu Ser He He Arg Gly He Val 
2545 2550 2555 

35 GAG CAA GAG ATT CAA GCA ATG GTT TCA AAG AGA GAG AAT ATT GCC ACC 

Glu Gin Glu He Gin Ala Met Val Ser Lys Arg Glu Asn He Ala Thr 
2560 2565 2570 

CAT CAT TTA TAT CAG GCA TGG GAT CCT GTC CCT TCT CTG TCT CCG GCT 
His His Leu Tyr Gin Ala Trp Asp Pro Val Pro Ser Leu Ser Pro Ala 
^ 2575 2580 2585 

ACT ACA GGT GCC CTC ATC AGC CAC GAG AAG CTG CTG CTA CAG ATC AAC 
Thr Thr Gly Ala Leu He Ser His Glu Lys Leu Leu Leu Gin He Asn 
2590 2595 2600 

CCC GAG CGG GAG CTG GGG AGC ATG AGC TAC AAA CTC GGC CAG GTG TCC 
45 Pro Glu Arg Glu Leu Gly Ser Met Ser Tyr Lys Leu Gly Gin Val Ser 

2605 2610 2615 2620 

ATA CAC TCC GTG TGG CTG GGG AAC AGC ATC ACA CCC CTG AGG GAG GAG 
He His Ser Val Trp Leu Gly Asn Ser He Thr Pro Leu Arg Glu Glu 
2625 2630 2635 



50 



55 



GAA TGG GAC GAG GAA GAG GAG GAG GAG GCC GAC GCC CCT GCA CCT TCG 
Glu Trp Asp Glu Glu Glu Glu Glu Glu Ala Asp Ala Pro Ala Pro Ser 
2640 2645 2650 

TCA CCA CCC ACG TCT CCA GTC AAC TCC AGG AAA CAC CGG GCT GGA GTT 
Ser Pro Pro Thr Ser Pro Val Asn Ser Arg Lys His Arg Ala Gly Val 
2655 2660 2665 

GAC ATC CAC TCC TGT TCG CAG TTT TTG CTT GAG TTG TAC AGC CGC TGG 
Asp He His Ser Cys Ser Gin Phe Leu Leu Glu Leu Tyr Ser Arg Trp 
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2670 2675 



2680 



10 



ATC CTG CCG TCC AGC TCA GCC AGG AGG ACC CCG GCC ATC CTG ATC AGT 
lie Leu Pro Ser Ser Ser Ala Arg Arg Thr Pro Ala He Leu He Ser 
^^^5 2690 2695 2700 



8415 



GAG GTG GTC AGA TCC CTT CTA GTG GTC TCA GAG TTG TTC ACC GAG CGC 8463 
Glu Val Val Arg Ser Leu Leu Val Val Ser Asp Leu Phe Thr Glu Ara 
2705 2710 2715 



AAC GAG TTT GAG CTG ATG TAT GTG ACG CTG ACA GAA CTG CGA AGG GTG 8511 
Asn Gin Phe Glu Leu Met Tyr Val Thr Leu Thr Glu Leu Arg Arg Val 
2720 2725 2730 

CAC CCT TCA GAA GAC GAG ATC CTC GCT CAG TAC CTG GTG CCT GCC ACC 855 9 

^5 Has Pro Ser Glu Asp Glu He Leu Ala Gin Tyr Leu Val Pro Ala Thr 

2735 2740 2745 

TGC AAG GCA GCT GCC GTC CTT GGG ATG GAC AAG GCC GTG GCG GAG CCT 8607 
Cys Lys Ala Ala Ala Val Leu Gly Met Asp Lys Ala Val Ala Glu Pro 
2750 2755 2760 

20 GTC AGC CGC CTG CTG GAG AGC ACG CTC AGG AGC AGC CAC CTG CCC AGC 8655 

Val Ser Arg Leu Leu Glu Ser Thr Leu Arg Ser Ser His Leu Pro Ser 
2765 2770 2775 2780 

AGG GTT GGA GCC CTG CAC GGC ATC CTC TAT GTG CTG GAG TGC GAC CTG 8703 
Arg Val Gly Ala Leu His Gly lie Leu Tyr Val Leu Glu Cys Asp Leu 
25 2785 2790 2795 

CTG GAC GAC ACT GCC AAG CAG CTC ATC CCG GTC ATC AGC GAC TAT CTC 8751 
Leu Asp Asp Thr Ala Lys Gin Leu He Pro Val He Ser Asp Tyr Leu 
28C0 2805 2810 

CTC TCC AAC CTG AAA GGG ATC GCC CAC TGC GTG AAC ATT CAC AGC CAG 8799 
30 Leu Ser Asn Leu Lys Gly He Ala His Cys Val Asn He His Ser Gin 

2815 2820 2825 

CAG CAC GTA CTG GTC ATG TGT GCC ACT GCG TTT TAC CTC ATT GAG AAC 8847 
Gin His Val Leu Val Met Cys Ala Thr Ala Phe Tyr Leu He Glu Asn 
2830 2835 2840 

35 TAT CCT CTG GAC GTA GGG CCG GAA TTT TCA GCA TCA ATA ATA CAG ATG 6 8Q5 

Tyr Pro Leu Asp Val Gly Pro Glu Phe Ser Ala Ser He He Gin Met 
2845 2850 2855 2860 

TGT GGG GTG ATG CTG TCT GGA AGT GAG GAG TCC ACC CCC TCC ATC ATT 8 94 3 

Cys Gly Val Met Leu Ser Gly Ser Glu Glu Ser Thr Pro Ser He He 
40 2865 2870 2875 

TAC CAC TGT GCC CTC AGA GGC CTG GAG CGC CTC CTG CTC TCT GAG CAG 8 9 91 

Tyr His Cys Ala Leu Arg Gly Leu Glu Arg Leu Leu Leu Ser Glu Gin 
2880 2885 2890 

CTC TCC CGC CTG GAT GCA GAA TCG CTG GTC AAG CTG AGT GTG GAC AGA 9039 
45 I-eu Ser Arg Leu Asp Ala Glu Ser Leu Val Lys Leu Ser Val Asp Arg 

2895 2900 2905 

GTG AAC GTG CAC AGC CCG CAC CGG GCC ATG GCG GCT CTG GGC CTG ATG 9087 
Val Asn Val His Ser Pro His Arg Ala Met Ala Ala Leu Gly Leu Met 
2910 2915 2920 

^ CTC ACC TGC ATG TAC ACA GGA AAG GAG AAA GTC AGT CCG GGT AGA ACT 913 5 

Leu Thr Cys Met Tyr Thr Gly Lys Glu Lys Val Ser Pro Gly Arg Thr 
2925 2930 2935 2940 

TCA GAC CCT AAT CCT GCA GCC CCC GAC AGC GAG TCA GTG ATT GTT GCT 9183 
Ser Asp Pro Asn Pro Ala Ala Pro Asp Ser Glu Ser Val He Val Ala 
55 2945 2950 2955 

ATG GAG CGG GTA TCT GTT CTT TTT GAT AGG ATC AGG AAA GGC TTT CCT 9231 
Met Glu Arg Val Ser Val Leu Phe Asp Arg He Arg Lys Gly Phe Pro 
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2960 2965 2970 

TGT GAA GCC AGA GTG GTG GCC AGG ATC CTG CCC CAG TTT CTA GAC GAC 927 9 

Cys Glu Ala Arg Val Val Ala Arg lie Leu Pro Gin Phe Leu Asp Asp 
2975 2980 2985 



TTC TTC CCA CCC CAG GAC ATC ATG AAC AAA GTC ATC GGA GAG TTT CTG 93 27 

Phe Phe Pro Pro Gin Asp lie Met Asn Lys Val lie Gly Glu Phe Leu 
2990 2995 3000 

^0 TCC AAC CAG CAG CCA TAG CCC CAG TTC ATG GCC ACC GTG GTG TAT AAG 93 75 

Ser Asn Gin Gin Pro Tyr Pro Gin Phe Met Ala Thr Val Val Tyr Lys 
3005 3010 3015 3020 

GTG TTT CAG ACT CTG CAC AGC ACC GGG CAG TCG TCC ATG GTC CGG GAC 9423 
Val Phe Gin Thr Leu His Ser Thr Gly Gin Ser Ser Met Val Arg Asp 
^5 3025 3030 3035 

TGG GTC ATG CTG TCC CTC TCC AAC TTC ACG CAG AGG GCC CCG GTC GCC 9471 
Trp Val Met Leu Ser Leu Ser Asn Phe Thr Gin Arg Ala Pro Val Ala 
3040 3045 3050 

ATG GCC ACG TGG AGC CTC TCC TGC TTC TTT GTC AGC GCG TCC ACC AGC 9519 
Met Ala Thr Trp Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser 
3055 3060 3065 

CCG TGG GTC GCG GCG ATC CTC CCA CAT GTC ATC AGC AGG ATG GGC AAG 9567 

Pro Trp Val Ala Ala lie Leu Pro His Val lie Ser Arg Met Gly Lys 

3070 3075 3080 

25 

CTG GAG CAG GTG GAC GTG AAC CTT TTC TGC CTG GTC GCC ACA GAC TTC 9615 

Leu Glu Gin Val Asp Val Asn Leu Phe Cys Leu Val Ala Thr Asp Phe 

3085 3090 3095 3100 

TAC AGA CAC CAG ATA GAG GAG GAG CTC GAC CGC AGG GCC TTC CAG TCT 966 3 

Tyr Arg Kis Gin lie Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser 
30 3105 3110 3115 

GTG CTT GAG GTG GTT GCA GCC CCA GGA AGC CCA TAT CAC CGG CTG CTG 9711 
Val Leu Glu Val Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu 
3120 3125 3130 



35 



40 



45 



55 



ACT TGT TTA CGA AAT GTC CAC AAG GTC ACC ACC TGC T GAGCGCCATG 975 8 

Thr Cys Leu Arg Asn Val His Lys Val Thr Thr Cys 
3135 3140 



GTGGGAGAGA CTGTGAGGCG GCAGCTGGGG CCGGAGCCTT TGGAAGTCTG TGCCCTTGTG 9818 

CCCTGCCTCC ACCGAGCCAG CTTGGTCCCT ATGGGCTTCC GCACATGCCG CGGGCGGCCA 987 8 

GGCAACGTGC GTGTCTCTGC CATGTGGCAG AAGTGCTCTT TGTGGCAGTG GCCAGGCAGG 993 8 

GAGTGTCTGC AGTCCTGGTG GGGCTGAGCC TGAGGCCTTC CAGAAAGCAG GAGCAGCTGT 999 8 

GCTGCACCCC ATGTGGGTGA CCAGGTCCTT TCTCCT3ATA GTCACCTGCT GGTTGTTGCC 10058 

AGGTTGCAGC TGCTCTTGCA TCTGGGCCAG AAGTCCTCCC TCCTGCAGGC TGGCTGTTGG 10118 

CCCCTCTGCT GTCCTGCAGT AGAAGGTGCC GTGAGCAGGC TTTGGGAACA CTGGCCTGGG 10178 

TCTCCCTGGT GGGGTGTGCA TGCCACGCCC CGTGTCTGGA TGCACAGATG CCATGGCCTG 1023 8 

^ TGCTGGGCCA GTGGCTGGGG GTGCTAGACA CCCGGCACCA TTCTCCCTTC TCTCTTTTCT 10298 

TCTCAGGATT TAAAATTTAA TTATATCAGT AAAGAGATTA ATTTTAACGT AAAAAAAAAA 10358 
AAAAAAAA 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 3144 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Met Ala Thr Leu Glu Lys Leu Met Lys Ala Phe Glu Ser Leu Lys Ser 
1 5 10 15 

Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 
20 25 30 

Gin Gin Gin Gin Gin Gin Gin Gin Pro Pro Pro Pro Pro Pro Pro Pro 
35 40 45 

Pro Pro Pro Gin Leu Pro Gin Pro Pro Pro Gin Ala Gin Pro Leu Leu 
^0 55 60 

Pro Gin Pro Gin Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Glv Pro 

70 75 80 

Ala Val Ala Glu Glu Pro Leu His Arg Pro Lys Lys Glu Leu Ser Ala 
85 90 95 

Thr Lys Lys Asp Arg Val Asn His Cys Leu Thr He Cys Glu Asn He 

105 110 

Val Ala Gin Ser Val Arg Asn Ser Pro Glu Fhe Gin Lys Leu Leu Glv 

120 125 

He Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp Ala Glu Ser Asp 



140 



Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val He Lys Ala Leu 
145 150 



155 160 



Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu Tyr Lys Glu He 
165 170 175 

Lys Lys Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala Leu Trp Arg Phe 
180 185 190 

Ala Glu Leu Ala His Leu Val Arg Pro Gin Lys Cys Arg Pro Tyr Leu 
155 200 205 

Val Asn Leu Leu Pro Cys Leu Thr Arg Thr Ser Lys Arg Pro Glu Glu 
210 215 220 

Ser Val Gin Glu Thr Leu Ala Ala Ala Val Pro Lys He Met Ala Ser 
225 230 235 240 

Phe Gly Asn Phe Ala Asn Asp Asn Glu He Lys Val Leu Leu Lys Ala 
245 250 255 

Phe He Ala Asn Leu Lys Ser Ser Ser Pro Thr He Arg Arg Thr Ala 

260 265 270 

Ala Gly Ser Ala Val Ser He Cys Gin His Ser Arg Arg Thr Gin Tvr 
275 280 285 

Phe Tyr Ser Trp Leu Leu Asn Val Leu Leu Gly Leu Leu Val Pro Val 
290 295 300 

Glu Asp Glu His Ser Thr Leu Leu He Leu Gly Val Leu Leu Thr Leu 

310 315 320 

Arg Tyr Leu Val Pro Leu Leu Gin Gin Gin Val Lys Asp Thr Ser Leu 
325 330 335 

Lys Gly Ser Phe Gly Val Thr Arg Lys Glu Met Glu Val Ser Pro Ser 



34 





EP 0 614 977 A2 



340 



345 



350 
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Ala Glu Gin Leu Val Gin Val Tyr Glu Leu Thr Leu His His Thr Gin 
35S 360 365 

His Gin AsTD His Asn Val Val Thr Gly Ala Leu Glu Leu Leu Gin Gin 
370 " 375 330 

Leu Phe Arg Thr Pro Pro Pro Glu Leu Leu Gin Thr Leu Thr Ala Val 
385 390 355 400 

Gly Gly He Gly Gin Leu Thr Ala Ala Lys Glu Glu Ser Gly Gly Arg 
405 410 415 

Ser Arg Ser Gly Ser lie Val Glu Leu lie Ala Gly Gly Gly Ser Ser 
420 425 430 

Cys Ser Pro Val Leu Ser Arg Lys Gin Lys Gly Lys Val Leu Leu Gly 

435 440 445 

Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg Ser Asp Val Ser 
450 455 460 

Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu He Ser Gly Glu Leu 

465 470 475 480 

Ala Ala Ser Ser Gly Val Ser Thr Pro Gly Ser Ala Gly His Asp He 
485 490 495 

He Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin Ala Asp Ser Leu 

500 505 510 

Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr Asp Gly Asp Glu 
515 520 525 

Glu Asp He Leu Ser His Ser Ser Ser Gin Val Ser Ala Val Pro Ser 
530 535 540 

Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala Ser Ser, Pro He 
545 550 555 560 

Ser Asp Ser Ser Gin Thr Thr Thr Glu Gly Pro Asp Ser Ala Val Thr 
565 570 575 

Pro Ser Aso Ser Ser Glu He Val Leu Asp Gly Thr Asp Asn Gin Tyr 

530 585 590 

Leu Gly Leu Gin He Gly Gin Pro Gin Asp Glu Asp Glu Glu Ala Thr 
595 600 605 

Gly He Leu Pro Asp Glu Ala Ser Glu Ala Phe Arg Asn Ser Ser Met 

610 615 620 

Ala Leu Gin Gin Ala His Leu Leu Lys Asn Met Ser His Cys Arg Gin 
625 630 635 640 

Pro Ser Asp Ser Ser Val Asp Lys Phe Val Leu Arg Asp Glu Ala Thr 

645 650 655 

Glu Pro Glv Asp Gin Glu Asn Lys Pro Cys Arg :ie Lys Gly Asp He 
" 660 665 670 

Gly Gin Ser Thr Asp Asp Asp Ser Ala Pro Leu Val His Ser Val Arg 
675 680 685 

Leu Leu Ser Ala Ser Phe Leu Leu Thr Gly Gly Lys Asn Val Leu Val 
690 695 700 

Pro Asp Arg Asp Val Arg Val Ser Val Lys Ala Leu Ala Leu Ser Cys 
70S 710 715 720 

Val Gly Ala Ala Val Ala Leu His Pro Glu Ser Phe Phe Ser Lys Leu 



725 



730 
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Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pro Glu Glu Gin Tyr Val 
'^^0 745 750 

Ser Asp lie Leu Asn Tyr He Asp His Gly Asp Pro Gin Val Arg Gly 

760 7G5 

Vl'^ Cys Gly Thr Leu He Cys Ser He Leu Ser Arg 

775 780 

Ser Arg Phe His Val Gly Asp Trp Met Gly Thr He Arg Thr Leu Thr 

^90 795 800 

Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu Leu Arg Lys Thr 
805 810 815 

Leu Lys Asp Glu Ser Ser Val Thr Cys Lys Leu Ala Cys Thr Ala Val 
820 825 830 

Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr Ser Glu Leu Gly 
835 840 845 

o^o Thr Leu Arg Asn Ser Ser Tyr Trp 

850 855 860 

Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu He Asp Phe Arg 

870 875 ^ 880 

Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu His Arg Gly Ala 
25 8 8 5 8 9 0 8 9 5 

His His Tyr Thr Gly Leu Leu Lys Leu Gin Glu Arg Val Leu Asn Asn 

^00 905 910 

Val Val He His Leu Leu Gly Asp Glu Asp Pro Arg Val Arg His Val 
915 920 925 

tn« '^^^ Lys Leu Phe Tyr Lys Cys 

530 935 940 ^ y y 

Asp Gin Gly Gin Ala Asp Pro Val Val Ala Val Ala Arg Asp Gin Ser 

550 955 

Ser Val Tyr Leu Lys Leu Leu Met His Glu Thr Gin Pro Pro Ser His 
965 970 975 

Phe Ser Val Ser Thr He Thr Arg He Tyr Arg Gly Tyr Asn Leu Leu 
980 985 990 

Pro Ser He Thr Asp Val Thr Met Glu Asn Asn Leu Ser Arg Val He 
995 1000 1005 

Ala Ala Val Ser His Glu Leu He Thr Ser Thr Thr Arg Ala Leu Thr 
1010 1015 1020 

^ n norr^-^^ Leu Ser Thr Ala Phe Pro Val 

1030 1035 1040 

Cys He Trp Ser Leu Gly Trp His Cys Gly Val Pro Pro Leu Ser Ala 
1045 1050 1055 

50 Ser Asp Glu Ser Arg Lys Ser Cys Thr Val Gly Met Ala Thr Met He 

1060 1065 1070 

Leu Thr Leu Leu Ser Ser Ala Trp Phe Pro Leu Asp Leu Ser Ala His 
1075 1080 1085 

55 ^f'^r.^^^ ^^"^ ^sn Leu Leu Ala Ala Ser Ala Pro 

1090 1095 1100 

^y^^^^^ ^^'-^ Ser Trp Ala Ser Glu Glu Glu Ala Asn Pro Ala 

1105 1110 1115 ^^20 
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Ala Thr Lys Gin Glu Glu Val Trp Pro Ala Leu Gly Asp Arg Ala Leu 
1125 1130 1135 

Val Pro Met Val Glu Gin Leu Phe Ser His Leu Leu Lys Val lie Asn 
^ 1140 1145 1150 

lie Cys Ala His Val Leu Asp Asp Val Ala Pro Gly Pro Ala lie Lys 
1155 1160 1165 

Ala Ala Leu Pro Ser Leu Thr Asn Pro Pro Ser Leu Ser Pro lie Arg 
^0 1170 1175 1180 

Arg Lys Gly Lys Glu Lys Glu Pro Gly Glu Gin Ala Ser Val Pro Leu 
1185 1190 1195 1200 

Ser Pro Lys Lys Gly Ser Glu Ala Ser Ala Ala Ser Arg Gin Ser Asp 
15 1205 1210 1215 

Thr Ser Gly Pro Val Thr Thr Ser Lys Ser Ser Ser Leu Gly Ser Phe 
1220 1225 1230 



20 



25 



30 



Tyr His Leu Pro Ser Tyr Leu Arg Leu His Asp Val Leu Lys Ala Thr 
1235 1240 1245 

His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn Ser Thr Glu Lys 
1250 1255 1260 

Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu Ser Gin lie Leu 
1265 1270 1275 1280 

■ Glu Leu Ala Thr Leu Gin Asp lie Glv Lys Cys Val Glu Glu- lie Leu 
1285 ' 1290 1295 

Gly Tyr Leu Lys Ser Cys Phe Ser Arg Glu Pro Met Met Ala Thr Val 
1300 1305 1310 

Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr Asn Leu Ala Ser 
1315 1320 1325 

Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser Gin Gly Arg Ala 
1330 1335 1340 

Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu Tyr His Tyr Cys 
134S 1350 1355 1360 

Phe Met Ala Pro Tyr Thr His Phe Thr Gin Ala Leu Ala Asp Ala Ser 
1365 1370 1375 

40 Leu Arg Asn Met Val Gin Ala Glu Gin Glu Asn Asp Thr Ser Gly Trp 

1380 1385 1390 

Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys Thr Asn Leu Thr 

1395 1400 1405 



35 
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Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala lie His Asn His lie 
1410 1415 1420 

Arg Leu Phe Glu Pro Leu Val lie Lys Ala Leu Lys Gin Tyr Thr Thr 
1425 1430 1435 1440 

Thr Thr Cys Val Gin Leu Gin Lys Gin Val Leu Asp Leu Leu Ala Gin 
50 1445 1450 1455 

Leu Val Gin Leu Arg val Asn Tyr Cys Leu Leu Asp Ser Asp Gin Val 
1460 1465 1470 



55 



Phe lie Gly Phe Val Leu Lys Gin Phe Glu Tyr He' Glu Val Gly Gin 
1475 1480 1485 

Phe Arg Glu Ser Glu Ala He lie Pro Asn He Phe Phe Phe Leu Val 
1490 1495 1500 
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Leu Leu Ser Tyr Glu Arg Tyr His Ser Lys Gin lie lie Gly lie Pro 
1505 1510 1515 1520 

Lys He He Gin Leu Cys Asp Gly He Met Ala Ser Gly Arg Lys Ala 
1525 1530 1535 

Val Thr His Ala He Pro Ala Leu Gin Pro He Val His Asp Leu Phe 
1540 1545 1550 

Val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys Glu Leu Glu Thr 
1555 1560 1565 

Gin Lys Glu Val Val Val Ser Met Leu Leu Arg Leu He Gin Tyr His 
1570 1575 1580 

Gin Val Leu Glu Met Phe He Leu Val Leu Gin Gin Cys His Lys Glu 
15 1585 1590 1595 1600 

Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin He Ala Asp He He 

1605 1610 1615 
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Leu Pro Met Leu Ala Lys Gin Gin Met His He Asp Ser His Glu Ala 
1620 1625 1630 

Leu Gly Val Leu Asn Thr Leu Phe Glu He Leu Ala Pro Ser Ser Leu 
1635 1640 1645 

Arg Pro Val Asp Met Leu Leu Arg Ser Met Phe Val Thr Pro Asn Thr 
1650 1655 1660 

Met Ala Ser Val Ser Thr Val Gin Leu Trp He Ser Gly He Leu Ala 
1665 1670 1675 1680 

He Leu Arg Val Leu He Ser Gin Ser Thr Glu Asp He Val Leu Ser 
1685 1690 1695 

Arg He Gin Glu Leu Ser Phe Ser Pro Tyr Leu He Ser Cys Thr Val 
1700 1705 1710 

He Asn Arg Leu Arg Asp Gly Asp Ser Thr Ser Thr Leu Glu Glu His 
1715 1720 1725 

Ser Glu Gly Lys Gin He Lys Asn Leu Pro Glu Glu Thr Phe Ser Arg 
1730 1735 1740 

Phe Leu Leu Gin Leu Val Gly He Leu Leu Glu Asp He Val Thr Lys 
1745 1750 1755 1760 

40 Gin Leu Lys Val Glu Met Ser Glu Gin Gin His Thr Phe Tyr Cys Gin 

1765 1770 1775 

Glu Leu Gly Thr Leu Leu Met Cys Leu He His He Phe Lys Ser Gly 
1780 1785 1790 
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Met Phe Arg Arg He Thr Ala Ala Ala Thr Arg Leu Phe Arg Ser Asp 
1795 1800 1805 

Gly Cys Gly Gly Ser Phe Tyr Thr Leu Asp Ser Leu Asn Leu Arg Ala 
1810 1015 1820 

Arg Ser Met He Thr Thr His Pro Ala Leu Val Leu Leu Trp Cys Gin 

50 1825 1830 1835 1840 

He Leu Leu Leu Val Asn His Thr Asp Tyr Arg Trp Trp Ala Glu Val 
1845 1850 1855 
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Gin Gin Thr Pro Lys Arg His Ser Leu Ser Ser Thr Lys Leu Leu Ser 
1860 1865 1870 

Pro Gin Met Ser Gly Glu Glu Glu Asp Ser Asp Leu Ala Ala Lys Leu 
1875 1880 1885 
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Gly Met Cys Asn Arg Glu lie Val Arg Arg Gly Ala Leu He Leu Phe 
1890 1895 1900 

Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu His Leu Thr Trp 
^ 1905 1910 1915 1920 

Leu He Val Asn His He Gin Asp Leu He Ser Leu Ser His Glu Pro 
1925 1930 1935 

Pro Val Gin Asp Phe He Ser Ala Val His Arg Asn Ser Ala Ala Ser 
10 1940 1945 1950 

Gly Leu Phe He Gin Ala He Gin Ser Arg Cys Glu Asn Leu Ser Thr 
1955 1960 1965 



15 
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Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu Gly He His Leu 
1970 1975 1980 

Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp Arg Leu Leu Cys 

1985 1990 1995 2000 

Thr Pro Phe Arg Val Leu Ala Arg Met Val Asp He Leu Ala Cys Arg 
2005 2010 2015 

Arg val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser Ser Met Ala Gin 

2020 2025 2030 

Leu Pro Met Glu Glu Leu Asn Arg He Gin Glu Tyr Leu Gin Ser Ser 
2035 2040 2045 

Gly Leu Ala Gin Arg His Gin Arg Leu Tyr Ser Leu Leu Asp Arg Phe 
2050 2055 2060 

Arg Leu Ser Thr Met Gin Asp Ser Leu Ser Pro Ser Pro Pro Val Ser 
2065 2070 2075 . 2080 

Ser His Pro Leu Aso Gly Asp Gly His Val Ser Leu Glu Thr Val Ser 
2085 2090 2095 

Pro Asp Lys Asp Trp Tyr Val His Leu Val Lys Ser Gin Cys Trp Thr 
2100 2105 2110 

35 Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu Val Asn Arg He 

2115 2120 2125 

Pro Ala Glu Asp Met Asn Ala Phe Met Met Asn Ser Glu Phe Asn Leu 
2130 2135 • 2140 

^ Ser Leu Leu Ala Pro Cys Leu Ser Leu Gly Met Ser Glu He Ser Gly 

^ 2145. 2150 2155 2160 

Gly Gin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu Val Thr Leu Ala 
2165 2170 2175 

Arg Val Ser Glv Thr Val Gin Gin Leu Pro Ala Val His His Val Phe 
45 2 1 8 0 2 1 8 5 2 1 90 

Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Trp Ser Lys Leu Asn 
2195 2200 2205 



50 
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Asp Leu Phe Gly Asp Ala Ala Leu Tyr Gin Ser Leu Pro Thr Leu Ala 
2210 2215 2220 

Arg Ala Leu Ala Gin Tyr Leu Val Val Val Ser Lys Leu Pro Ser His 
2225 2230 2235 2240 

Leu His Leu Pro Pro Glu Lys Glu Lys Asp He Val Lys Phe Val Val 
2245 2250 2255 

Ala Thr Leu Glu Ala Leu Ser Trp His Leu He His Glu Gin He Pro 
2260 2265 2270 
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Leu Ser Leu^Asp Leu Gin Ala Gly^Leu Asp Cys Cys Cys^Leu Ala Leu 
Gin Leu^Pro Gly Leu Trp Ser^Val Val Ser Ser Thr Glu Phe Val Thr 



2300 



His^Ala cys Ser Leu lie Tyr Cys Val His Phe He Leu Glu Ala Val 

2315 2320 
Ala val Gin Pro Gly Glu Gin Leu Leu Ser Pro Glu Arg Arg Thr Asn 
2325 2330 233S 

Thr Pro Lys Ala He Ser Glu Glu Glu Glu Glu Val Asp Pro Asn Thr 
'^"^^ 2345 2350 

Gin Asn Pro Lys Tyr He Thr Ala Ala Cys Glu Met Val Ala Glu Met 

2360 2365 

fl^^Ser Leu Gin Ser Val^Leu Ala Leu Gly His^Lys Arg Asn Ser 

Gly Val Pro Ala Phe Leu Thr Pro Leu Leu Arg Asn He He He Ser 

2390 2395 2400 

Leu Ala Arg Leu Pro Leu Val Asn Ser Tyr Thr Arg Val Pro Pro Leu 

2405 2410 2415 

val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly Asp Phe Gly Thr 
2420 2425 2430 

Ala Phe Pro Glu He Pro Val Glu Phe Leu Gin Glu Lys Glu Val Phe 
-d^jD 2440 



2445 



Lys Glu Phe He Tyr Arg lie Asn Thr Leu Gly Trp Thr Ser Arg Thr 

2455 2460 

Gln^Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly Val Leu Val Thr Gin 

2475 2480 

Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu Glu Asp Thr Glu 
^^^85 2490 2495 

Arg Thr Gin lie Asn Val Leu Ala Val Gin Ala He Thr Ser Leu Val 
^^00 2505 2510 

Leu ser Ala Met Thr Val Pro Val Ala Gly Asn Pro Ala Val Ser Cys 
-^515 2520 ^ 



2525 



^^"^ V^t^^"^ ^"""^ ^""^ Ala Leu Asp Thr Arq 

^^-5" 2535 2540 

Phe Gly Arg Lys Leu Ser He He Arg Gly He Val Glu Gin Glu He 

2550 2555 2560 

Gin Ala Met Val Ser Lys Arg Glu Asn He Ala Thr His His Leu Tyr 
2565 2570 2575 

Gin Ala Trp Asp Pro Val Pro Ser Leu Ser Pro Ala Thr Thr Gly Ala 
2580 2585 2590 

Leu He Ser His Glu Lys Leu Leu Leu Gin He Asn Pro Glu Arg Glu 
-^^y=> 2600 2605 

^^"^ V^^.r^^"" ^""'^ ^^"^ Ser He His Ser Val 
^blO 2615 



2620 



Trp Leu Gly Asn Ser He Thr Pro Leu Arg Glu Glu Glu Trp Asp Glu 

2630 2635 2640 

Glu Glu Glu Glu Glu Ala Asp Ala Pro Ala Pro Ser Ser Pro Pro Thr 
2645 2650 2655 
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Ser Pro Val Asn Ser Arg Lys His Arg Ala Gly Val Asp lie His Ser 
2660 2665 2670 

Cys Ser Gin Phe Leu Leu Glu Leu Tyr Ser Arg Trp lie Leu Pro Ser 
^ 2675 2680 2685 

Ser Ser Ala Arg Arg Thr Pro Ala lie Leu lie Ser Glu Val Val Arg 
2690 2695 2700 

Ser Leu Leu Val Val Ser Asp Leu Phe Thr Glu Arg Asn Gin Phe Glu 
10 2705 2710 2715 2720 

Leu Met Tyr Val Thr Leu Thr Glu Leu Arg Arg Val His Pro Ser Glu 
2725 2730 2735 



Asp Glu lie Leu Ala Gin Tyr Leu Val Pro Ala Thr Cys Lys Ala Ala 
2740 2745 2750 

Ala Val Leu Gly Met Asp Lys Ala Val Ala Glu Pro Val Ser Arg Leu 

2755 2760 2765 

Leu Glu Ser Thr Leu Arg Ser Ser His Leu Pro Ser Arg Val Gly Ala 
2770 2775 2780 

Leu His Gly lie Leu Tyr Val Leu Glu Cys Asp Leu Leu Asp Asp Thr 
2785 2790 2795 2800 

Ala Lys Gin Leu lie Pro Val lie Ser Asp Tyr Leu Leu Ser Asn Leu 
2805 2810 2815 

Lys Gly lie Ala His Cys Val Asn lie His Ser Gin Gin His Val Leu 
2820 2825 2830 

Val Met Cys Ala Thr Ala Phe Tyr Leu He Glu Asn Tyr Pro. Leu Asp 
2835 2840 2845 

Val Gly Pro Glu Phe Ser Ala Ser He He Gin Met Cys Gly Val Met 
2850 2855 2860 

Leu Ser Gly Ser Glu Glu Ser Thr Pro Ser He He Tyr His Cys Ala 
2865 2870 2875 2880 

Leu Arg Gly Leu Glu Arg Leu Leu Leu Ser Glu Gin Leu Ser Arg Leu 
2885 2890 2895 

Asp Ala Glu Ser Leu Val Lys Leu Ser Val Asp Arg Val Asn Val His 

2900 2905 2910 

Ser Pro His Arg Ala Met Ala Ala Leu Gly Leu Met Leu Thr Cys Met 
2915 2920 2925 

Tyr Thr Gly Lys Glu Lys Val Ser Pro Gly Arg Thr Ser Asp Pro Asn 
2930 2935 2940 

Pro Ala Ala Pro Asp Ser Glu Ser val He Val Ala Met Glu Arg val 
^ 2945 2950 2955 2960 

Ser Val Leu Phe Asp Arg He Arg Lys Gly Phe Pro Cys Glu Ala Arg 
2965 2970 2975 

Val Val Ala Arg He Leu Pro Gin Phe Leu Asp Asp Phe Phe Pro Pro 
so 2980 2985 2990 

Gin AsD He Met Asn Lys Val He Gly Glu Phe Leu Ser Asn Gin Gin 
2995 3000 3005 



Pro Tyr Pro Gin Phe Met Ala Thr Val Val Tyr Lys Val Phe Gin Thr 
3010 3015 3020 

Leu His Ser Thr Gly Gin Ser Ser Met Val Arg Asp Trp Val Met Leu 
3025 3030 3035 3040 
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Ser Leu Ser Asn Phe Thr Gin Arg Ala Pro Val Ala Met Ala Thr Trp 
3045 3050 3055 

Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser Pro Trp Val Ala 
3060 3065 3070 

Ala lie Leu Pre His Val lie Ser Ajrg Met Gly Lys Leu Glu Gin Val 
3075 3080 3085 

Asp Val Asn Leu Phe Cys Leu Val Ala Thr Asp Phe Tyr Arg His Gin 
3090 3095 3100 

lie Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser Val Leu Glu Val 
3105 3110 3115 3120 

Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu Thr Cys Leu Arg 
3125 3130 3135 

Asn Val His Lys Val Thr Thr Cys 
3140 
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Claims 

1. An isolated, purified or recombinant polypeptide comprising a huntingtin protein or a mutuant fragment 
or variant thereof having substantially the same activity as huntingtin protein. 

2. A polypeptide according to claim 1 having the amino acid sequence shown in SEQ ID NO:6. 

3. A polypeptide according to claim 1 or 2 which is essentially purified and/or has at least 5 contiguous amino 
acids. 

4. An isolated, purified or recombinant nucleic acid molecule comprising nucleic acid which is: 

(a) a sequence encoding a huntingtin protein according to any preceding claim (whether normal or ge- 
netically defective), or its complementary strand; 

(b) a sequence that is substantially homologous to, or hybridises under stringent conditions to, either 
sequence in (a); 

(c) a sequence that is substantially homologous to. or would hybridise under stringent conditions to, a 
sequence in (a) or (b) but for the degeneracy of the genetic code; 

or a fragment of any of (a), (b) or (c). 

5. A nucleic acid according to claim 1, wherein the huntingtin protein has the amino acid sequence shown 
in SEQ ID NO:6 and/or the nucleic acid is DNA encoding the amino acid sequence SEQ ID NO:5. 

6. A nucleic acid molecule according to claim 4 or 5 which is a probe for detecting the presence of huntingtin 
in a sample comprising being at least 5, such as at least 15, contiguous nucleotides. 

7. A (preferably recombinant) nucleic acid molecule according to any of claims 4 to 6 comprising a transcrip- 
tional region functional in a cell operably linked to a sequence complimentary to an RNA sequence en- 
coding a protein according to any of claims 1 to 3 or at least 5 contiguous amino acids thereof. 

8. A vector comprising a nudeic acid molecule according to any of claims 4 to 7. 

9. A vector according to claim 8 wherein the nucleic acid molecule, such as encoding huntingtin protein, is 
operably linked to transcriptional and/or translational expression signals. 

10. A host cell transformed or transfected with a vector according to claim 4 or 5. 

11. An antibody specific for huntingtin protein, or a protein according to any of claims 1 to 3. 

12. A hybridoma which produces an antibody according to claim 11. 

13. A method of detecting the presence of. or predisposition to develop. Huntington's disease in a subject, 
the method comprising evaluating the characteristics of huntingtin nucleic acid in a sample from the sub- 
ject, for example in relation to the number of (CAG) repeats. 

14. A niethod according to daim 13 comprising; 

(a) taking a sample from the subject; 

(b) evaluating the characteristics of huntingtin nucleic acid in the sample, wherein the evaluation com- 
prises detecting the huntingtin (CAG)n region in the sample; and 

(c) comparing the characteristics found in (b) with a similar analysis from an individual not having, or 
not suspected of having, Huntington's disease; and 

(d) the presence of, or predisposition to develop, Huntington's disease being indicated if those char- 
acteristics in the huntingtin (CAG)n region differ. 

15. A method according to daim 13 comprising: 

(a) taking a sample from a subject and; 

(b) evaluating the characteristics of huntingtin nucleic acid comprising the huntingtin (CAG)n region in 
the sample by Southern blot, northern blot, or polymerase chain reaction analysis. 
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16. The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective)- 

5 (c) a host cell according to daim 10 expressing a polypeptide which is functional (or non-defective)- 

and/or ' 

(d) an antagonist to, or a compound that binds to, Huntingdon protein; 

in the preparation of an agent for treating, delaying or preventing a neurodegenerative disorder. 

10 17. The use according to claim 16 which Is gene therapy. 

18. The use according to daim 16 or 17 for treating, preventing or delaying Huntingdon's disease. 

19. The use according to any of claims 16 to 17 wherein the nucleic acid has from 11 to 34 (CAG) repeats 
and/or the polypeptide has from 11 to 34 Gin repeats, said repeats being consecutive. 

20. A diagnostic and/or Immunoassay kit comprising at least one container and; 

(a) a nucleic acid molecule according to any of claims 4 to 6. optionally labelled; or 

(b) an antibody according to claim 11, optionally labelled. 

20 21. The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any oif claims 1 to 3 which is functional (or non-defective)- 

25 Ind/o^^^ ^"^^"^'"^ expressing a polypeptide which is functional (or non-defective); 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 
in the preparation of a medicament 

22. A pharmaceutical composition comprising: 

30 (a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 

encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective)- 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective)- 
and/or 

35 (d) an antagonist to, or a compound that binds to, huntingdon protein; 

in admixture with pharmaceutically acceptable canier. 

23. A process for the preparation of a polypeptide, the process comprising culturing a host cell according to 
cla^m 10 under conditions whereby the polypeptide is expressed, and purifying or isolating the polypep- 
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